
The Premise
Human thought doesn't always require language. Converting thoughts to words can slow reasoning. New research suggests AI can also reason more efficiently without language.
How LLMs Process Information
LLMs convert text to tokens, then to embeddings (number vectors), process through layers to create hidden states, and predict next tokens. Chain-of-thought prompting improves accuracy but the token conversion cycle introduces inefficiency and information loss.
Coconut: Thinking Without Words
Researchers modified GPT-2 to feed hidden states directly back as input embeddings — bypassing token conversion entirely. Coconut (Chain of Continuous Thought) maintains uncertainty in latent space until the final answer.
- Logic reasoning: Equal accuracy (98.8%) but Coconut used 1/10 the tokens
- Multi-choice problems: 97% accuracy (vs. 77.5%) with 1/3 the tokens
- Math: Lower accuracy (34% vs. 43%) — likely because Coconut wasn't trained natively for this
Getting Loopy: Repeating Transformer Layers
Tom Goldstein's team created transformers with recurrent blocks that loop multiple times. The model naturally adjusts loop count by problem difficulty — easy problems finish quickly, hard ones loop longer.
Results: Their small model beat OLMo-7B on math (28% vs. 4%).
Future Outlook
Latent-space reasoning could fundamentally change how LLMs "think," but requires major architectural redesign and faces risks of diverging from human-interpretable reasoning.