Study: Half of Health Responses From Major AI Chatbots Contain Problematic Information

A peer-reviewed audit published in BMJ Open has found that half of the health-related responses generated by five major AI chatbots contain problematic information, including fabricated sources and confidently delivered inaccuracies. The findings raise serious questions about the reliability of artificial intelligence tools in medical contexts.

Fabricated sources and confident errors

Researchers analyzed responses from five widely used AI chatbots. They found that 50% of the answers contained at least one piece of problematic information. Many of these responses included citations that looked real — complete with author names, journal titles, and publication years — but were entirely made up. The errors weren't presented as guesses or uncertainties; they were delivered with the same tone of authority as correct information, making them tough for users to spot.

Why the audit stands out

The study was peer-reviewed and published in BMJ Open, a respected medical journal. That adds weight to the findings. The audit didn't name the specific chatbots tested, but described them as major platforms. The researchers evaluated health-related queries across a range of topics, though the exact list of questions wasn't disclosed in the public summary of the findings.

The problem isn't just that errors exist — it's that they're hard to catch. A user looking up a symptom or drug interaction might get a wrong answer that sounds perfectly convincing. The audit shows that chatbots can invent entire references, which undermines trust in the information they provide.

What this means for users

More people are turning to chatbots for quick health advice. The audit suggests they should be cautious. A response that seems detailed and authoritative might contain fabricated facts. The study's authors — the researchers behind the peer-reviewed audit — didn't offer specific recommendations in the public release, but the data makes a clear case for verifying any health information from AI with a medical professional or reliable source.

How the companies behind these chatbots will respond remains an open question. The audit provides a snapshot of current performance, but it doesn't track whether the same issues persist after updates. That uncertainty is itself a concern for anyone using these tools for medical guidance.

Fabricated sources and confident errors

Why the audit stands out

What this means for users

Related Articles