KIVA - The Ultimate AI SEO Agent Try it Today!

ChatGPT Hallucinations Are Getting Worse—and Even OpenAI Doesn’t Know Why

  • Writer
  • May 8, 2025
    Updated
chatgpt-hallucinations-are-getting-worse-and-even-openai-doesnt-know-why

Key Takeaways

• OpenAI’s internal tests show GPT-4 mini hallucinates nearly 80% of the time on certain factual tasks.

• Newer AI models, designed for more complex reasoning, are producing more false information than older versions.

• OpenAI has not identified the cause of increased hallucination rates in its most advanced models.

• The issue raises new concerns about the trustworthiness and utility of AI tools in professional and everyday use.


OpenAI’s most advanced artificial intelligence models meant to usher in a new era of human-like reasoning are also the most prone to generating false information.

According to OpenAI’s internal tests, hallucination rates in newer models like GPT-3 (o3) and GPT-4 mini (o4-mini) have increased significantly compared to earlier systems.

This troubling trend is emerging just as these models are being adopted more widely in education, customer support, research, and even coding.

While OpenAI’s goal was to improve reasoning and reduce errors, the results suggest the opposite is happening.


Hallucination Rates Are Climbing, Not Falling

OpenAI tested its models using two benchmarks: PersonQA, which involves answering questions about public figures, and SimpleQA, a general knowledge test. The results raise red flags about factual accuracy.


• GPT-3 hallucinated 33% of the time on PersonQA; GPT-4 mini reached 48%
• On SimpleQA, GPT-3 hallucinated 51% of the time, while GPT-4 mini scored 79%
• The earlier GPT-1 model, by comparison, hallucinated 44% on SimpleQA

The term “hallucination” in AI refers to the generation of plausible-sounding but factually incorrect or entirely fabricated information. This makes it difficult for users to know when to trust the system’s output—especially when it delivers responses with confidence.


Designed to Reason—But Struggling with Reality

These newer chatgpt ai models are part of OpenAI’s initiative to build what it calls “reasoning systems.” Unlike earlier models that focused on statistical pattern recognition, reasoning models attempt to solve problems by breaking them into logical steps.


“Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem.”
— OpenAI, on GPT-1

While that approach should, in theory, lead to more reliable outcomes, the reality appears far more complex. The increase in hallucination rates suggests that more powerful reasoning may be entangled with a higher risk of error.


OpenAI Responds to the Findings

OpenAI has acknowledged the hallucination issue but cautions against drawing a direct link between model complexity and factual inaccuracy. The company emphasizes that the problem is being actively investigated.


“Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini.”
— Gaby Raila, OpenAI

OpenAI has not yet provided a technical explanation for why the hallucination problem is worsening. The company continues to refine its training processes and benchmarks in hopes of improving output quality.


Why It Matters

The hallucination issue affects both user trust and the practical utility of AI tools in professional settings. If users must double-check everything the model says, the time-saving benefit of using AI tools becomes negligible.


• Misleading or incorrect information can harm decisions in education, law, health, and business
• Increased hallucination rates undermine the model’s reliability as a research or writing assistant
• Without significant improvements, adoption of AI for high-stakes tasks may stall or backfire

The current environment calls for caution. While large language models continue to evolve in capability, their reliability remains inconsistent—and unpredictably so.

OpenAI’s most capable AI models are also its least accurate when it comes to factual consistency. This paradox exposes a key challenge for the future of generative AI: balancing intelligence with trust.


“Trust me, hallucinations aren’t just random bugs, they’re built right into the way these models work.”

As the technology becomes more embedded in everyday tools and workflows, solving the hallucination problem isn’t optional it’s essential. For now, users must remain skeptical and verify outputs carefully, regardless of how intelligent the answers may seem.

For more news and insights, visit AI News on our website.

Was this article helpful?
YesNo
Generic placeholder image
Writer
Articles written401

I’m Anosha Shariq, a tech-savvy content and news writer with a flair for breaking down complex AI topics into stories that inform and inspire. From writing in-depth features to creating buzz on social media, I help shape conversations around the ever-evolving world of artificial intelligence.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *