LLMs Fail at Introspection

November 19, 2025

Here is one way large language models are similar to the humans that make them. Ars Technica reports, “LLMs Show a ‘Highly Unreliable’ Capacity to Describe Their Own Internal Processes.” It is a longish technical write up basically stating, “Hey, we have no idea what we are doing.” Since AI coders are not particularly self-aware, why would their code be? Senior gaming editor Kyle Orland describes a recent study from Anthropic:

“If you ask an LLM to explain its own reasoning process, it may well simply confabulate a plausible-sounding explanation for its actions based on text found in its training data. To get around this problem, Anthropic is expanding on its previous research into AI interpretability with a new study that aims to measure LLMs’ actual so-called ‘introspective awareness’ of their own inference processes. The full paper on ‘Emergent Introspective Awareness in Large Language Models’ uses some interesting methods to separate out the metaphorical ‘thought process’ represented by an LLM’s artificial neurons from simple text output that purports to represent that process. In the end, though, the research finds that current AI models are ‘highly unreliable’ at describing their own inner workings and that ‘failures of introspection remain the norm.’”

Not even developers understand precisely how LLMs do what they do. So much for asking the models themselves to explain it to us. We are told more research is needed to determine how models assess their own processes in the rare instances that they do. Are they even remotely accurate? How would researchers know? Opacity on top of opacity. The world is in good virtual hands.

See the article for the paper’s methodology and technical details.

Cynthia Murrell, November 19, 2025

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta