simple.ai by @dharmesh
Posts
The Hidden Problem with AI Reliability

The Hidden Problem with AI Reliability

...and how OpenAI's new research may have solved it

Dharmesh Shah
September 11, 2025

There’s always been something weird about AI that I constantly have to remind people of: AI hallucinations.

People are placing increasing trust in AI systems for increasingly critical tasks, while a large amount of people are completely unaware that these systems regularly "hallucinate" -- confidently generate information that's completely wrong.

But this week, OpenAI published new research that could fundamentally change this reliability problem, and it's a lot simpler than I expected.

So today, I want to break down:

What AI hallucinations are
The new research that explains why they happen
Why solving this could transform industries

—@dharmesh

What Are AI Hallucinations?

AI hallucinations occur when language models confidently generate information that sounds completely plausible but is factually incorrect.

It’s not occasional mistakes or typos. These are systematic fabrications where AI systems present false information with complete authority.

It's a big flaw in the AI world and a major contributor to the "AI slop" trend flooding the internet (which is not good for the industry as a whole).

A few common examples you may have seen before:

Creating fake academic citations with realistic author names
Inventing quotes from real people
Fabricating legal precedents that don't exist
Making up product features or technical specifications

What's truly surprising is how little this gets discussed outside of AI research circles. Most “regular people“ conversations about AI productivity completely skip over the reliability question, while people increasingly trust these systems for critical work.

OpenAI already made big progress on hallucination rate when it launched its GPT-5 reasoning model. But it’s still not *perfect*:

Screenshot from OpenAI’s GPT-5 launch livestream

Now, OpenAI's newest research paper suggests they've found the root cause and a potential solution.

But first, here's a simple verification strategy: Never use AI-generated facts without independent confirmation.

For any specific claims, dates, or statistics: Cross-reference with sources
For citations or studies: Actually look them up in the original publication
For legal info: Check with legal databases, not just AI summaries

This extra verification step takes time, but it prevents potentially costly mistakes. The rule is simple: if the information matters, verify it independently.

The Breakthrough: Why AI Systems Learn to Guess

So here's what OpenAI just figured out.

Last week, they published research that potentially reveals the root cause of hallucinations, and it's not what most people assumed.

Why Language Models Hallucinate

According to the paper, the problem isn't with the language models themselves. It's with how we train them.

Current training methods create what a "confidence-guessing incentive." AI models are scored on training datasets as either completely right or wrong. They get full points for lucky guesses but zero points for saying "I don't know."
This teaches AI systems that confident guessing is always better than admitting uncertainty, even when they have no idea what the correct answer is.
To test this theory, OpenAI researchers asked models for specific information like dissertation titles and birth dates. The models confidently produced different wrong answers each time, rather than acknowledging they didn't know.

In short, OpenAI believes AI systems hallucinate because they've been trained to always provide an answer, even when they should say "I don't know."

The proposed solution is surprisingly straightforward: redesign evaluation metrics to explicitly reward honesty over lucky guesses. Instead of penalizing uncertainty, penalize confident errors more heavily.

What makes this research a potential breakthrough is that it reframes hallucinations from an unsolvable technical problem to a fixable training problem. We just need to stop teaching AI to guess when it should be honest about not knowing something.

Sometimes the simplest explanations are the right ones.

Why This is Important

We're potentially at a turning point for AI reliability.

The combination of improved models like GPT-5 and this new training approach could solve one of the biggest barriers to AI adoption in high-stakes environments.

For industries that can't afford even a 1% error rate, this could be transformative:

Law firms could trust AI for case research without manually verifying every citation
Financial analysts could rely on AI-generated analysis for critical decisions
Companies could deploy AI for customer service and operations without constant oversight

This is also particularly crucial for autonomous agents like those we’re building on agent.ai. Right now, you can't have an agent making business decisions if it might confidently provide incorrect information.

But AI systems that reliably say "I don't know" when uncertain or ask for human help would change things.

If this research direction proves successful, we might finally have AI systems that deserve the trust people are already placing in them.

—Dharmesh (@dharmesh)

What'd you think of today's email?

Click below to let me know.