Researchers made an algorithm that can tell when AI is hallucinating

Despite how impressive AI like ChatGPT, Claude, and even Gemini might be, these large language models all have one big problem in common: they hallucinate a lot. This is a big problem in the AI world, and even Apple is worried about how it’ll handle hallucinations in the future with Apple Intelligence. Luckily, a group of researchers have now created an AI hallucination detector, which can tell if an AI has made something up.

These hallucinations have led to a number of embarrassing and intriguing slip-ups—and they continue to be one of the main reasons that AI like ChatGPT isn’t more useful. We’ve seen Google forced to make changes to its AI search overviews after the AI started telling people it was safe to eat rocks and to put glue on pizza. We’ve even seen lawyers who used ChatGPT to help write a court filing fined because the chatbot hallucinated citations for the document.

Perhaps those issues could have been avoided if they’d had the AI hallucination detector described in a new paper published in the journal Nature. According to the paper, a new algorithm developed by researchers can help discern whether AI-generated answers are factual roughly 79 percent of the time. That isn’t a perfect record, of course, but it is 10 percent higher than the other leading methods out there right now.

Chatbots like Gemini and ChatGPT can be useful, but they can also hallucinate answers very easily.

The research was carried out by members of Oxford University’s Department of Computer Science. The method used is relatively simple, the researchers explain in the paper. First, they have the chatbot answer the same prompt several times, usually five to ten. Then, they calculate a number for what we call semantic entropy—which is the measure of how similar or different the meanings of an answer are.

Tech. Entertainment. Science. Your inbox.

By signing up, I agree to the Terms of Use and have reviewed the Privacy Notice.

If the model answers differently for each of the prompt entries, then the semantic entropy score is higher, indicating that the AI might be hallucinating the answer. If the answers are all identical or have similar meanings, though, the semantic entropy score will be lower, indicating it is giving a more consistent and likely factual answer. As I said, it isn’t a foolproof AI hallucination detector, but it is an interesting way to handle it.

Other methods rely on what we call naive entropy, which usually checks to see if the wording of an answer, rather than its meaning, is different. As such, it isn’t as likely to pick up on hallucinations as accurately because it isn’t looking at the meaning behind the words in the sentence.

The researchers say that the algorithm could be added to chatbots like ChatGPT via a button, allowing users to receive a “certainty score” for the answers they are given to their prompts. Having an AI hallucination detector built directly into the chatbot is enticing, so I can see the usefulness of adding such a tool to the various chatbots out there.

Source