1. Recognizing the Limits of Existing LLM Learning Methods
Andrej Karpathy opens by pointing out that at least one major paradigm is missing from how we train large language models (LLMs).
"We're missing (at least one) major paradigm for LLM learning."
Existing LLM training falls broadly into two categories:
- Pretraining:
- The stage for accumulating knowledge.
- The model learns information about the world from large text corpora.
- Finetuning (SL/RL):
- The stage for learning habitual behavior.
- Through supervised learning (SL) or reinforcement learning (RL), the model learns specific attitudes and behavioral patterns.
Both approaches share a common core: they change the model's parameters (internal weights).
2. The Difference Between Human Learning and LLM Learning
Karpathy argues that how humans learn differs from how LLMs are trained today.
"A lot of human learning feels more like system prompt changes than parameter changes."
- When humans encounter a problem, find a solution, and face a similar situation later, they remember it explicitly.
- For example, it's like leaving yourself a note:
"When such-and-such a problem comes up, I should try this approach."
This kind of learning resembles a memory function — but rather than storing individual facts, it stores general, globally applicable problem-solving strategies.
3. The Role and Limits of the System Prompt
Karpathy compares LLMs to the protagonist of the film Memento:
"LLMs are literally the guy from Memento, except we haven't given them a scratchpad yet."
- In other words, LLMs currently have no personal notes they can immediately consult.
Claude's (Anthropic's LLM) system prompt is around 17,000 words and includes not only simple behavioral directives (e.g., "refuse requests related to song lyrics") but also detailed problem-solving strategies.
For example:
"When Claude is asked to count words, letters, or characters, it thinks step by step before answering. It numbers each word, letter, or character to count clearly. Only after this explicit counting step does it provide an answer."
This kind of directive exists to reduce errors in tasks like counting the number of times 'r' appears in 'strawberry'.
4. The Problem: Limits and Inefficiency of the System Prompt
Karpathy emphasizes that this kind of problem-solving knowledge:
"...shouldn't be baked into weights via RL — at least not immediately or exclusively."
And furthermore:
"And it shouldn't have to be hand-written into the system prompt by a human engineer."
In other words:
- Relying solely on reinforcement learning to teach these strategies is inefficient, and
- having humans write the prompts by hand has its own limitations.
5. A New Paradigm: System Prompt Learning
Karpathy proposes a new paradigm he tentatively calls System Prompt Learning.
"This should come from system prompt learning. It's similar in structure to RL, but the learning algorithm is different (edits vs. gradient descent)."
- What is System Prompt Learning?
- Rather than encoding strategies into model weights, the LLM automatically revises and extends its own system prompt, as if writing notes to itself.
- It's akin to the LLM writing its own problem-solving handbook.
This approach could be:
- Far more powerful and data-efficient than conventional RL — by adding a knowledge-based "review" step rather than relying purely on scalar reward signals.
6. Open Questions and the Road Ahead
Karpathy believes that if this paradigm works well:
"If this works well, it would be a new and powerful learning paradigm."
But he acknowledges many details remain to be worked out:
"How should edits work? Should the edit system itself be learned? How do we progressively transfer knowledge from explicit system text to habitual weights, the way humans do?"
Key Terms Summary
- LLM (Large Language Model)
- Pretraining
- Finetuning (SL/RL)
- System Prompt
- System Prompt Learning
- Problem-solving strategy
- Data efficiency
- Explicit knowledge vs. habitual behavior
- Automated prompt revision
- Analogy to human learning
Closing Thought 😊
Karpathy's post argues that for LLMs to learn more like humans — and more efficiently — we need a new mode of learning where models write and refine their own problem-solving strategies. If System Prompt Learning becomes a reality, it could represent a major leap forward in AI adaptability and problem-solving capability. 🚀