Making Language Models Smarter by Sidestepping Language

🧠 Introduction: Can We Think Without Language?

There is an argument that human thought and reasoning don't necessarily require language — that converting thoughts into words can actually slow things down. Now, intriguing evidence is emerging that artificial intelligence (AI) can also "think" more efficiently without relying on language.

"A lot of researchers wonder: can you reason purely in latent space?" — Mike Knoop, co-creator of the AI abstract reasoning benchmark

🤖 How Do LLMs Process Information?

1. The Basic Structure of an LLM

LLMs actually process information in a mathematical space called the latent space.
These models are built on deep learning neural networks that convert input text into sequences of numbers (embeddings) for computation.
The space where these numerical calculations take place is called the latent space.

2. Tokens and Embeddings

LLMs operate not on words but on tokens (words, word fragments, characters, etc.).
Input text is broken into tokens, and each token is converted into a numerical vector called an embedding.
As embeddings pass through multiple layers, they become increasingly interconnected, ultimately producing a hidden state.

"The hidden state is called 'hidden' because it isn't exposed externally. It contains all the information needed to predict the next token."

3. Iterative Token Generation

The model appends the predicted token back to the input and repeats the embedding → layers → hidden state process.
This continues until an end-of-sequence token is produced.

4. Chain of Thought

Before producing a final answer, LLMs can show their reasoning process as a sequence of tokens.
This is called chain of thought, and it significantly improves a model's accuracy.

"Chain of thought not only shows how the model thinks — it also leads to far more accurate answers."

5. Limitations

Repeatedly converting between embeddings and tokens introduces inefficiency and potential information loss.
"To reason in latent space, you have to skip that step." — Shibo Hao, PhD student at UC San Diego

🥥 Coconut: An LLM That Thinks Without Converting to Language

1. A New Approach: The Coconut Model

Hao and colleagues modified GPT-2 to feed hidden states directly back as input embeddings.
This means information is processed continuously in numerical space without ever being converted to tokens.
They named this model Coconut (Chain of Continuous Thought).

"In continuous or latent reasoning, there's no need to convert thoughts into language. You can preserve uncertainty within your thinking and only commit to an answer at the very end. It's a fundamentally different mode of reasoning." — Shibo Hao

2. Performance Comparison

Logical reasoning tasks: Both Coconut and the baseline GPT-2 achieved 98.8% accuracy, but Coconut used only 1/10 the tokens.
Multi-choice problems: Coconut used only 1/3 the tokens, with accuracy of 97% vs. 77.5% for the baseline.
Math problems: Coconut used fewer tokens but scored 34% accuracy, lower than the baseline at 43%.

"I think Coconut would have done better if it had been trained from scratch with latent-space reasoning in mind." — Shibo Hao

3. Limitations and Areas for Improvement

Coconut was constrained to only a limited number of iterations in latent space.
"Ideally, the model should decide for itself when to stop reasoning." — Shibo Hao

🔁 Getting Loopy: LLMs That Think Iteratively

1. A New Approach from the Goldstein Team

Tom Goldstein's team devised a structure in which transformer layers can be applied repeatedly.
They grouped 4 of 8 layers into a recurrent block that can be used multiple times as needed.
The output of the recurrent block is fed back as its own input, so reasoning stays entirely in latent space.

"Every modern LLM has a fixed number of layers. That's a fundamental constraint." — Tom Goldstein

2. Model Characteristics

The model adjusts the number of iterations based on problem difficulty on its own.
Easy problems are resolved quickly; harder problems involve more iterations.

"We didn't train it to do this — it emerged naturally. It seems the model recognizes when a problem is easy." — Jonas Geiping, co-author

3. Performance

The model significantly outperformed OLMo-7B (a much larger model) on math problems: 28% vs. 4% accuracy.
"Our model is far ahead." — Tom Goldstein

🧐 Limitations and Future Outlook

1. Practical Challenges

Incorporating latent-space reasoning into existing LLM architectures would require extensive redesign.
"Large companies have already invested heavily in existing architectures, so switching right away would be difficult." — Shibo Hao

2. Risks of Latent-Space Reasoning

Because LLMs are trained on text-based data, reasoning that bypasses language could diverge from human thought.
"Once you move into continuous space, you open up all kinds of possibilities that may not actually be helpful." — Luke Zettlemoyer, professor at the University of Washington

3. New Possibilities

Despite these concerns, latent-space reasoning holds the potential to fundamentally transform how LLMs "think."

"One of the goals of this kind of research is to genuinely change the way reasoning works. There's an opportunity here for something big." — Luke Zettlemoyer

Key Terms

Latent space
Embedding
Hidden state
Chain of thought
Coconut model
Recurrent block
Efficiency, information loss, shift in reasoning paradigm

📝 Summary

Experimental evidence is building that, like humans, AI can think more efficiently without language.
Latent-space reasoning can reduce information loss and yield faster, more accurate results.
There are still many challenges and limitations, but this research is attracting attention as a potential revolutionary shift in how AI reasons.

"Who knows what new patterns this kind of approach might uncover?" — Luke Zettlemoyer

✨ It's worth looking forward to how AI's way of "thinking" might change in the years ahead.

🧠 Introduction: Can We Think Without Language?

"A lot of researchers wonder: can you reason purely in latent space?" — Mike Knoop, co-creator of the AI abstract reasoning benchmark

🤖 How Do LLMs Process Information?

1. The Basic Structure of an LLM

LLMs actually process information in a mathematical space called the latent space.
These models are built on deep learning neural networks that convert input text into sequences of numbers (embeddings) for computation.
The space where these numerical calculations take place is called the latent space.

2. Tokens and Embeddings

LLMs operate not on words but on tokens (words, word fragments, characters, etc.).
Input text is broken into tokens, and each token is converted into a numerical vector called an embedding.
As embeddings pass through multiple layers, they become increasingly interconnected, ultimately producing a hidden state.

"The hidden state is called 'hidden' because it isn't exposed externally. It contains all the information needed to predict the next token."

3. Iterative Token Generation

The model appends the predicted token back to the input and repeats the embedding → layers → hidden state process.
This continues until an end-of-sequence token is produced.

4. Chain of Thought

Before producing a final answer, LLMs can show their reasoning process as a sequence of tokens.
This is called chain of thought, and it significantly improves a model's accuracy.

"Chain of thought not only shows how the model thinks — it also leads to far more accurate answers."

5. Limitations

Repeatedly converting between embeddings and tokens introduces inefficiency and potential information loss.
"To reason in latent space, you have to skip that step." — Shibo Hao, PhD student at UC San Diego

🥥 Coconut: An LLM That Thinks Without Converting to Language

1. A New Approach: The Coconut Model

Hao and colleagues modified GPT-2 to feed hidden states directly back as input embeddings.
This means information is processed continuously in numerical space without ever being converted to tokens.
They named this model Coconut (Chain of Continuous Thought).

"In continuous or latent reasoning, there's no need to convert thoughts into language. You can preserve uncertainty within your thinking and only commit to an answer at the very end. It's a fundamentally different mode of reasoning." — Shibo Hao

2. Performance Comparison

Logical reasoning tasks: Both Coconut and the baseline GPT-2 achieved 98.8% accuracy, but Coconut used only 1/10 the tokens.
Multi-choice problems: Coconut used only 1/3 the tokens, with accuracy of 97% vs. 77.5% for the baseline.
Math problems: Coconut used fewer tokens but scored 34% accuracy, lower than the baseline at 43%.

"I think Coconut would have done better if it had been trained from scratch with latent-space reasoning in mind." — Shibo Hao

3. Limitations and Areas for Improvement

Coconut was constrained to only a limited number of iterations in latent space.
"Ideally, the model should decide for itself when to stop reasoning." — Shibo Hao

🔁 Getting Loopy: LLMs That Think Iteratively

1. A New Approach from the Goldstein Team

Tom Goldstein's team devised a structure in which transformer layers can be applied repeatedly.
They grouped 4 of 8 layers into a recurrent block that can be used multiple times as needed.
The output of the recurrent block is fed back as its own input, so reasoning stays entirely in latent space.

"Every modern LLM has a fixed number of layers. That's a fundamental constraint." — Tom Goldstein

2. Model Characteristics

The model adjusts the number of iterations based on problem difficulty on its own.
Easy problems are resolved quickly; harder problems involve more iterations.

"We didn't train it to do this — it emerged naturally. It seems the model recognizes when a problem is easy." — Jonas Geiping, co-author

3. Performance

The model significantly outperformed OLMo-7B (a much larger model) on math problems: 28% vs. 4% accuracy.
"Our model is far ahead." — Tom Goldstein

🧐 Limitations and Future Outlook

1. Practical Challenges

Incorporating latent-space reasoning into existing LLM architectures would require extensive redesign.
"Large companies have already invested heavily in existing architectures, so switching right away would be difficult." — Shibo Hao

2. Risks of Latent-Space Reasoning

Because LLMs are trained on text-based data, reasoning that bypasses language could diverge from human thought.
"Once you move into continuous space, you open up all kinds of possibilities that may not actually be helpful." — Luke Zettlemoyer, professor at the University of Washington

3. New Possibilities

Despite these concerns, latent-space reasoning holds the potential to fundamentally transform how LLMs "think."

"One of the goals of this kind of research is to genuinely change the way reasoning works. There's an opportunity here for something big." — Luke Zettlemoyer

Key Terms

Latent space
Embedding
Hidden state
Chain of thought
Coconut model
Recurrent block
Efficiency, information loss, shift in reasoning paradigm

📝 Summary

Experimental evidence is building that, like humans, AI can think more efficiently without language.
Latent-space reasoning can reduce information loss and yield faster, more accurate results.
There are still many challenges and limitations, but this research is attracting attention as a potential revolutionary shift in how AI reasons.

"Who knows what new patterns this kind of approach might uncover?" — Luke Zettlemoyer

✨ It's worth looking forward to how AI's way of "thinking" might change in the years ahead.

🧠 Introduction: Can We Think Without Language?

🤖 How Do LLMs Process Information?

1. The Basic Structure of an LLM

2. Tokens and Embeddings

3. Iterative Token Generation

4. Chain of Thought

5. Limitations

🥥 Coconut: An LLM That Thinks Without Converting to Language

1. A New Approach: The Coconut Model

2. Performance Comparison

3. Limitations and Areas for Improvement

🔁 Getting Loopy: LLMs That Think Iteratively

1. A New Approach from the Goldstein Team

2. Model Characteristics

3. Performance

🧐 Limitations and Future Outlook

1. Practical Challenges

2. Risks of Latent-Space Reasoning

3. New Possibilities

Key Terms

📝 Summary

Related writing

Understanding Society Through Simulation: Simile's Joon Sung Park

Vibe Coding University Member Debuts as AX Consultant

Midjourney Full-Body Ultrasound: From Images to Outcomes

Reading

🧠 Introduction: Can We Think Without Language?

🤖 How Do LLMs Process Information?

1. The Basic Structure of an LLM

2. Tokens and Embeddings

3. Iterative Token Generation

4. Chain of Thought

5. Limitations

🥥 Coconut: An LLM That Thinks Without Converting to Language

1. A New Approach: The Coconut Model

2. Performance Comparison

3. Limitations and Areas for Improvement

🔁 Getting Loopy: LLMs That Think Iteratively

1. A New Approach from the Goldstein Team

2. Model Characteristics

3. Performance

🧐 Limitations and Future Outlook

1. Practical Challenges

2. Risks of Latent-Space Reasoning

3. New Possibilities

Key Terms

📝 Summary

Related writing

Understanding Society Through Simulation: Simile's Joon Sung Park

Vibe Coding University Member Debuts as AX Consultant

Midjourney Full-Body Ultrasound: From Images to Outcomes