Why GPT-4 Is a Major Flop – Techopedia
GPT-4 made big waves upon its release in March 2023, but finally, the cracks in the surface are beginning to show. Not only did ChatGPTs traffic drop by 9.7% in June,but a study published by Stanford University in July found that GPT-3.5 and GPT-4s performance on numerous tasks has gotten substantially worse over time.
In one notable example, when asked whether 17,077 was a prime number in March 2023, GPT-4 correctly answered with 97.6% accuracy, but this figure dropped to 2.4% in June. This was just one area of many where the capabilities of GPT-3.5 and GPT-4 declined over time.
James Zou, assistant professor at Stanford University, told Techopedia:
Our research shows that LLM drift is a major challenge in stable integration and deployment of LLMs in practice. Drift, or changes in LLMs behaviors, such as changes in its formatting or changes in its reasoning, can break downstream pipelines.
This highlights the importance of continuous monitoring of ChatGPTs behavior, which we are working on, Zou added.
Stanfords study, How is ChatGPTs behavior changing over time, looked to examine the performance of GPT-3.5 and GPT-4 across four key areas in March 2023 and June 2023.
A summary of each of these areas is listed below:
Although many have argued that GPT-4 has got lazier and dumber, with respect to ChatGPT, Zou believes its hard to say that ChatGPT is uniformly getting worse, but its certainly not always improving in all areas.
The reasons behind this lack of improvement, or decline in performance in some key areas, is hard to explain because its black box development approach means there is no transparency into how the organization is updating or fine-tuning its models behind the scenes.
However, Peter Welinder, OpenAIs VP of Product, has argued against critics whove suggested that GPT-4 is on the decline but suggests that users are just becoming more aware of its limitations.
No, we havent made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one. Current hypothesis: When you use it more heavily, you start noticing issues you didnt see before, Welinder said in a Twitter post.
While increasing user awareness doesnt completely explain the decline in GPT-4s ability to solve math problems and generate code, Welinders comments do highlight that as user adoption increases, users and organizations will gradually develop greater awareness of the limitations posed by the technology.
Although there are many potential LLM use cases that can provide real value to organizations, the limitations of this technology are becoming more clear in a number of key areas.
For instance, another research paper, developed by Tencent AI lab researchers Wenxiang Jiao and Wenxuan Wang, found that the tool might not be as good at translating languages as is often suggested.
The report noted that while ChatGPT was competitive with commercial translation products like Google Translate in translating European languages, it lags behind significantly when translating low-resource or distant languages.
At the same time, many security researchers are critical of the capabilities of LLMs within cybersecurity workflows, with 64.2% of whitehat researchers reporting that ChatGPT displayed limited accuracy in identifying security vulnerabilities.
Likewise, open-source governance provider Endor Labs has released research indicating that LLMs can only accurately classify malware risk in just 5% of all cases.
Of course, its also impossible to overlook the tendency that LLMs have to hallucinate, invent facts, and state them to users as if they were correct.
Many of these issues stem from the fact that LLMs dont think but process user queries, leverage training data to infer context, and then predict a text output. This means it can predict both right and wrong answers (not to mention that bias or inaccuracies in the dataset can carry over into responses).
As such, they are a long way away from being able to live up to the hype of acting as a precursor to artificial general intelligence (AGI).
The public reception around ChatGPT is extremely mixed, with consumers sharing optimistic and pessimistic attitudes about the technologys capabilities.
On one hand, Capgemini Research Institute polled 10,000 respondents across Australia, Canada, France, Germany, Italy, Japan, the Netherlands, Norway, Singapore, Spain, Sweden, the UK, and the U.S. and found that 73% of consumers trust content written by generative AI.
Many of these users trusted generative AI solutions to the extent that they were willing to seek financial, medical, and relationship advice from a virtual assistant.
On the other side, there are many who are more anxious about the technology, with a survey conducted by Malwarebytes finding that not only did 63% of respondents not trust the information that LLMs produce, but 81% were concerned about possible security and safety risks.
It remains to be seen how this will change in the future, but its clear that hype around the technology isnt dead just yet, even if more and more performance issues are becoming apparent.
While generative AI solutions like ChatGPT still offer valuable use cases to enterprises, organizations need to be much more proactive about monitoring the performance of applications of this technology to avoid downstream challenges.
In an environment where the performance of LLMs like GPT-4 and GPT-3.5 is inconsistent at best or on the decline at worse, organizations cant afford to enable employees to blindly trust the output of these solutions and must continuously assess the output of these solutions to avoid being misinformed or spreading misinformation.
Zou said:
We recommend following our approach to periodically assess the LLMs responses on a set of questions that captures relevant application scenarios. In parallel, its also important to engineer the downstream pipeline to be robust to small changes in the LLMs.
For users that got caught up in the hype surrounding GPT, the reality of its performance limitations means its a flop. However, it can still be a valuable tool for organizations and users that remain mindful of its limitations and attempt to work around them.
Taking actions, such as double-checking the output of LLMs to make sure facts and other logical information are correct, can help ensure that users benefit from the technology without being misled.
Original post:
Why GPT-4 Is a Major Flop - Techopedia
- "I lost trust": Why the OpenAI team in charge of safeguarding humanity imploded - Vox.com - May 18th, 2024 [May 18th, 2024]
- 63% of surveyed Americans want government legislation to prevent super intelligent AI from ever being achieved - PC Gamer - May 18th, 2024 [May 18th, 2024]
- Top OpenAI researcher resigns, saying company prioritized 'shiny products' over AI safety - Fortune - May 18th, 2024 [May 18th, 2024]
- The revolution in artificial intelligence and artificial general intelligence - Washington Times - May 18th, 2024 [May 18th, 2024]
- OpenAI disbands team devoted to artificial intelligence risks - Yahoo! Voices - May 18th, 2024 [May 18th, 2024]
- OpenAI disbands safety team focused on risk of artificial intelligence causing 'human extinction' - New York Post - May 18th, 2024 [May 18th, 2024]
- OpenAI disbands team devoted to artificial intelligence risks - Port Lavaca Wave - May 18th, 2024 [May 18th, 2024]
- OpenAI disbands team devoted to artificial intelligence risks - Moore County News Press - May 18th, 2024 [May 18th, 2024]
- Generative AI Is Totally Shameless. I Want to Be It - WIRED - May 18th, 2024 [May 18th, 2024]
- OpenAI researcher resigns, claiming safety has taken a backseat to shiny products - The Verge - May 18th, 2024 [May 18th, 2024]
- Most of Surveyed Americans Do Not Want Super Intelligent AI - 80.lv - May 18th, 2024 [May 18th, 2024]
- How Artificial General Intelligence Will Shape the Future - Analytics Insight - May 18th, 2024 [May 18th, 2024]
- A former OpenAI leader says safety has 'taken a backseat to shiny products' at the AI company - Winnipeg Free Press - May 18th, 2024 [May 18th, 2024]
- DeepMind CEO says Google to spend more than $100B on AGI despite hype - Cointelegraph - April 20th, 2024 [April 20th, 2024]
- Congressional panel outlines five guardrails for AI use in House - FedScoop - April 20th, 2024 [April 20th, 2024]
- The Potential and Perils of Advanced Artificial General Intelligence - elblog.pl - April 20th, 2024 [April 20th, 2024]
- Artificial General Intelligence (AGI) Market size is worth USD 27.47 Billion by 2030 with 37.5 % As Reveale... - WhaTech - April 20th, 2024 [April 20th, 2024]
- DeepMind Head: Google AI Spending Could Exceed $100 Billion - PYMNTS.com - April 20th, 2024 [April 20th, 2024]
- Q&A: Mark Zuckerberg on winning the AI race - The Verge - April 20th, 2024 [April 20th, 2024]
- Say hi to Tong Tong, world's first AGI child-image figure - ecns - April 20th, 2024 [April 20th, 2024]
- Silicon Scholars: AI and The Muslim Ummah - IslamiCity - April 20th, 2024 [April 20th, 2024]
- AI stocks aren't like the dot-com bubble. Here's why - Quartz - April 20th, 2024 [April 20th, 2024]
- AI vs. AGI: The Race for Performance, Battling the Cost? for NASDAQ:GOOG by Moshkelgosha - TradingView - April 20th, 2024 [April 20th, 2024]
- We've Been Here Before: AI Promised Humanlike Machines In 1958 - The Good Men Project - April 20th, 2024 [April 20th, 2024]
- Google will spend more than $100 billion on AI, exec says - Quartz - April 20th, 2024 [April 20th, 2024]
- Tech companies want to build artificial general intelligence. But who decides when AGI is attained? - ABC News - April 8th, 2024 [April 8th, 2024]
- Tech companies want to build artificial general intelligence. But who decides when AGI is attained? - The Bakersfield Californian - April 8th, 2024 [April 8th, 2024]
- Tech companies want to build artificial general intelligence. But who decides when AGI is attained? - The Caledonian-Record - April 8th, 2024 [April 8th, 2024]
- What is AGI and how is it different from AI? - ReadWrite - April 8th, 2024 [April 8th, 2024]
- Artificial intelligence in healthcare: defining the most common terms - HealthITAnalytics.com - April 8th, 2024 [April 8th, 2024]
- We're Focusing on the Wrong Kind of AI Apocalypse - TIME - April 8th, 2024 [April 8th, 2024]
- Xi Jinping's vision in supporting the artificial intelligence at home and abroad - Modern Diplomacy - April 8th, 2024 [April 8th, 2024]
- As 'The Matrix' turns 25, the chilling artificial intelligence (AI) projection at its core isn't as outlandish as it once seemed - TechRadar - April 8th, 2024 [April 8th, 2024]
- AI & robotics briefing: Why superintelligent AI won't sneak up on us - Nature.com - January 10th, 2024 [January 10th, 2024]
- Get Ready for the Great AI Disappointment - WIRED - January 10th, 2024 [January 10th, 2024]
- Part 3 Capitalism in the Age of Artificial General Intelligence (AGI) - Medium - January 10th, 2024 [January 10th, 2024]
- Artificial General Intelligence (AGI): what it is and why its discovery can change the world - Medium - January 10th, 2024 [January 10th, 2024]
- Exploring the Path to Artificial General Intelligence - Medriva - January 10th, 2024 [January 10th, 2024]
- The Acceleration Towards Artificial General Intelligence (AGI) and Its Implications - Medriva - January 10th, 2024 [January 10th, 2024]
- OpenAI Warns: "AGI Is Coming" - Do we have a reason to worry? - Medium - January 10th, 2024 [January 10th, 2024]
- The fight over ethics intensifies as artificial intelligence quickly changes the world - 9 & 10 News - January 10th, 2024 [January 10th, 2024]
- AI as the Third Window into Humanity: Understanding Human Behavior and Emotions - Medriva - January 10th, 2024 [January 10th, 2024]
- Artificial General Intelligence (AGI) in Radiation Oncology: Transformative Technology - Medriva - January 10th, 2024 [January 10th, 2024]
- Exploring the Potential of AGI: Opportunities and Challenges - Medium - January 10th, 2024 [January 10th, 2024]
- Full-Spectrum Cognitive Development Incorporating AI for Evolution and Collective Intelligence - Medriva - January 10th, 2024 [January 10th, 2024]
- Artificial Superintelligence - Understanding a Future Tech that Will Change the World! - MobileAppDaily - January 10th, 2024 [January 10th, 2024]
- Title: AI Unveiled: Exploring the Realm of Artificial Intelligence - Medium - January 10th, 2024 [January 10th, 2024]
- The Simple Reason Why AGI (Artificial General Intelligence) Is Not ... - Medium - December 2nd, 2023 [December 2nd, 2023]
- What does the future hold for generative AI? - MIT News - December 2nd, 2023 [December 2nd, 2023]
- One year after its public launch, ChatGPT has succeeded in igniting ... - Morningstar - December 2nd, 2023 [December 2nd, 2023]
- Macy's Could See Over $7.5 Billion in Additional Business Gains ... - CMSWire - December 2nd, 2023 [December 2nd, 2023]
- Securing the cloud and AI: Insights from Laceworks CISO - SiliconANGLE News - December 2nd, 2023 [December 2nd, 2023]
- Amazon unleashes Q, an AI assistant for the workplace - Ars Technica - December 2nd, 2023 [December 2nd, 2023]
- You're not imagining things: The end of the -3- - Morningstar - December 2nd, 2023 [December 2nd, 2023]
- OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say - Reuters - November 24th, 2023 [November 24th, 2023]
- What the OpenAI drama means for AI progress and safety - Nature.com - November 24th, 2023 [November 24th, 2023]
- The fallout from the weirdness at OpenAI - The Economist - November 24th, 2023 [November 24th, 2023]
- How an 'internet of AIs' will take artificial intelligence to the next level - Cointelegraph - November 24th, 2023 [November 24th, 2023]
- OpenAI Is Seeking Additional Investment in Artificial General ... - AiThority - November 24th, 2023 [November 24th, 2023]
- Top AI researcher launches new Alberta lab with Huawei funds after ... - The Globe and Mail - November 24th, 2023 [November 24th, 2023]
- Will AI Replace Humanity? - KDnuggets - November 24th, 2023 [November 24th, 2023]
- This Week in AI: Accelerationism, AGI and the Law - PYMNTS.com - November 24th, 2023 [November 24th, 2023]
- Tesla FSD v12 Rolls Out to Employees With Update 2023.38.10 ... - Not a Tesla App - November 24th, 2023 [November 24th, 2023]
- Searching AI-powered ChatGpt for HNP authors, the Great Salt ... - The Daily Herald - November 24th, 2023 [November 24th, 2023]
- Unveiling the Mechanics of AI: How Artificial Intelligence Works - Medium - August 16th, 2023 [August 16th, 2023]
- The stakes are high so are the rewards: Artificial intelligence and ... - Building - August 16th, 2023 [August 16th, 2023]
- What will AI do to question-based inquiry? (opinion) - Inside Higher Ed - August 16th, 2023 [August 16th, 2023]
- The Department of State's pilot project approach to AI adoption - FedScoop - August 16th, 2023 [August 16th, 2023]
- Anthropic and SK Telecom team up to build AI model for telcos - Tech Monitor - August 16th, 2023 [August 16th, 2023]
- Derry City & Strabane - Explore the Future of Education and ... - Derry City and Strabane District Council - August 16th, 2023 [August 16th, 2023]
- Ethical Considerations of Using AI for Academic Purposes - Unite.AI - August 16th, 2023 [August 16th, 2023]
- Elon Musk says Tesla cars now have a mind, figured out 'some aspects of AGI' - Electrek - August 13th, 2023 [August 13th, 2023]
- To Navigate the Age of AI, the World Needs a New Turing Test - WIRED - August 13th, 2023 [August 13th, 2023]
- What's Behind the Race to Create Artificial General Intelligence? - Truthdig - August 13th, 2023 [August 13th, 2023]
- Why Hawaii Should Take The Lead On Regulating Artificial ... - Honolulu Civil Beat - August 13th, 2023 [August 13th, 2023]
- Artificial Intelligence (AI) Explained in Simple Terms - MUO - MakeUseOf - August 13th, 2023 [August 13th, 2023]
- The Pros and Cons of Artificial Intelligence (AI) - Fagen wasanni - August 13th, 2023 [August 13th, 2023]
- Will "godlike AI" kill us all or unlock the secrets of the universe ... - Salon - August 13th, 2023 [August 13th, 2023]
- What is Artificial Intelligence (AI)? - Fagen wasanni - August 13th, 2023 [August 13th, 2023]
- Will the Microsoft AI Red Team Prevent AI from Going Rogue on ... - Fagen wasanni - August 13th, 2023 [August 13th, 2023]