How AI reasons and plans and does mental math!
Last time I wrote about how Anthropic uncovered where different ideas, and world concepts are stored inside a large language model. Well, shortly after sending the email, Anthropic released the results of a follow up research where they show how an AI actually reasons and evidence that rather just writing text one token at a time, the AI actually plans ahead.
A few highlights from their research:
Concepts like “car” or “quantum mechanics” are stored inside an AI in a multi-lingual form
Claude 3.5 for example understands many human languages, English, German, French, Chinese,… even Swahili. But when an AI learns about “cars” in different languages, does the AI hold a different concept of what a “car” is for each language? The latest research by Anthropic suggests the answer is: no. The AI stores “car” and related facts and as an abstract feature which is then combined with a specific language feature when producing text.
LLMs plan ahead when writing
Models like Claude a trained to create one token at a time, and this raises an important question: does the AI reason and plan ahead? By probing the brain of AI while it is writing poetry, the research found out that in fact, yes, the AI is planning it’s token long ahead. As an example, they took the first part of a poem:
He saw a carrot and had to grab it
then examined what the AI was thinking while generating the next sentence
He saw a carrot and had to grab it
His hunger was like a starving rabbit
Even before writing the first token the AI already internally selected “rabbit” as a good candidate to end the rhyme with. Next, the researchers decided to artificially suppress “rabbit” and found that the AI went for the next best option, “habit” and completed the rhyme to match that instead:
He saw a carrot and had to grab it
His hunger was a powerful habit
AI does mental math differently from what it claims to do
While AI has always been notoriously bad at arithmetic and math, they are slowly improving. For simple mental arithmetic such as calculating 36+59, AI is mostly correct these days and one open question was whether AI simply memorized it from training, if it actually calculates it internally or does something else entirely.
From a few experiments, Anthropic researchers found out that in the AI’s brain it combines two internal concepts and reasonings: (i) 36+59 should be roughly 90, and (ii) that 6+9 should end with a “5”. Hence the final answer 95. Surprisingly, when asking the AI to elaborate and explain how it came up with the answer, it does not seem to be aware of it’s internal process and instead will explain a simple algorithm.
My thoughts
Although I understand the building blocks of AI: transformers, activations, tokenization… how it actually understands my prompts was always an uncomfortable black box for me. It is interesting to see that we’re starting to dissect an AI’s brain in a very similar way to how we do it in neuroscience experiments where we attach electrodes to an animals brain and measure electric responses. Except in this case we can do it much faster and much more controlled because we can monitor any neuron of the AI. In the blog of Anthropic they call this “AI biology”.
What surprised me most, is how natural these insights sound. Somehow by training on predicting the next token, the AI actually managed to learn concepts and it’s internal thinking feels so human. In the Netherlands we have a yearly tradition of writing silly rhyming poems during Sinterklaas (our version of Christmas) and my poem writing strategy is exactly the same as what Claude does.
I’m curious to see if the same approach can shed more light on why AI hallucinates. Is it a “conscious” decision, does the AI get confused because it needs to choose between different internal mental concepts, or maybe because when generating a long answer it simply forgets and fakes its confidence instead? Although the research does touch this topic slightly (Claude has an internal on/off concept that decides whether to answer a question or reply “I don’t know”), it doesn’t explain me why it gives a wrong answer while I’m 100% sure it does know the correct one :’)
Looking forward to future research on this topic!
What we’ve been reading/watching this week
- Anthropic analyzed millions of conversations that users conducted with Claude
- While all the state-of-the-art AI claim that they are better at math than ever, recent research shows that they score less than 5% on the 2025 US math olympiad.
- Similarly, while most AI can solve at least 98% of all LeetCode problems, most of them can only solve about 30% of the problems released since March.
- ARC released a new version of their reasoning benchmark, ARC-2, consisting of various IQ like puzzles that are very easy for humans, but extremely difficult for AI to solve.
- Ever wondered how much CO2 we unleash into the atmosphere by using AI? Turns out, 1 year of chatbot use roughly equals a 10 km drive with a car.
- The Turing test was beaten by openAI’s GPT-4.5 model. In a controlled test where humans were chatting with either humans or AI, 73% of the participants voted the chat model to be human (compared to about 50% of the participants to be mistaken as AI by other human participants).
That’s it for now. Catch you soon!
With love,
Lan and Robert