This essay reflects on the most important lesson from 70 years of artificial intelligence (AI) research history — the bitter lesson. It emphasizes that general methods leveraging computation are ultimately the most effective in AI, and that the gap only widens over time. Through historical examples, we learn that efforts to directly inject complex human knowledge into systems, while seemingly useful in the short term, ultimately hinder progress — and that general approaches based on massive computation and learning are what bring success.
1. The Bitter Lesson of AI Research
The biggest lesson from 70 years of AI research is that general methods leveraging computation are ultimately the most effective. This is because the cost per unit of computation has been exponentially decreasing thanks to Moore's law. Much AI research was conducted under the assumption that an agent's available computation would remain constant, but in reality, enormously more computation becomes available over time.
Researchers strive to leverage human knowledge of a domain for short-term gains, but in the long run, only leveraging computation matters. While these two approaches are not necessarily at odds, in practice they tend to conflict. Time invested in one approach is time not invested in the other. Moreover, human knowledge-based approaches tend to complicate methodologies, making them incompatible with general methods that leverage computation. It is important to examine the many cases where AI researchers were slow to learn this bitter lesson.
2. The Advancement of Computer Chess and Go
In computer chess, the methods that defeated world champion Kasparov in 1997 were based on massive, deep search. Most computer chess researchers at the time pursued approaches leveraging human understanding of chess's special structure, so they were deeply disappointed by this result. When simpler search-based approaches using special hardware and software proved far more effective, human knowledge-based chess researchers struggled to accept defeat. They said:
"Brute force search may have won this time, but it is not a general strategy, and besides, it is not how people play chess."
These researchers wanted human input-based methods to win, and when they didn't, they could not hide their disappointment.
A similar pattern of research progress appeared 20 years later in computer Go. Initially, there were enormous efforts to avoid search by leveraging human knowledge or special features of the game, but all of these efforts became irrelevant or even counterproductive once search was effectively applied at scale.
Learning through self-play to learn value functions was also important. (This was true in many other games and even in chess, although learning did not play a major role in the program that first defeated the world champion in 1997.) Self-play learning and general learning, like search, enable the use of massive amounts of computation. Search and learning are the two most important techniques for leveraging vast amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial efforts focused on leveraging human understanding (thinking less search would be needed), and much greater success came much later by actively embracing search and learning.
3. Changes in Speech Recognition and Computer Vision
In speech recognition, there were early competitions sponsored by DARPA in the 1970s. Participants presented various specialized methods leveraging human knowledge about words, phonemes, the human vocal tract, and more. On the other side were new statistical methods based on hidden Markov models (HMMs) that performed much more computation. Once again, the statistical methods overwhelmed human knowledge-based methods. This brought about a major shift across the entire field of natural language processing over several decades, with statistics and computation ultimately dominating the field.
The recent rise of deep learning in speech recognition is the latest step in this consistent direction. Deep learning methods rely much less on human knowledge, use much more computation, and produce dramatically better speech recognition systems through learning from massive training datasets. As in games, researchers always tried to build systems in the way they thought their minds worked — they tried to put that knowledge into the systems. But as massive computing power became available through Moore's law and methods were found to use it effectively, these efforts ultimately backfired and resulted in enormous waste of researcher time.
A similar pattern appeared in computer vision. Early methods conceptualized vision in terms of finding edges, generalized cylinders, or SIFT features. But today all of that has been discarded. Modern deep learning neural networks use only convolutions and certain kinds of invariance concepts, and achieve far better performance.
4. The Core of the Bitter Lesson and the Path Forward
This is truly a big lesson. As a field, we still have not fully learned this lesson and continue to make the same kind of mistakes. To recognize this and effectively resist it, we must understand why these mistakes feel so appealing. We must learn the bitter lesson that "building systems the way we think" does not work in the long run.
The bitter lesson is based on the following historical observations:
- AI researchers have often tried to build knowledge into agents.
- This always helps in the short term and gives researchers personal satisfaction.
- But in the long run it plateaus and even inhibits further progress.
- Breakthrough progress ultimately comes from the opposing approach based on scaling computation through search and learning.
The ultimate success often leaves a bitter taste and is not fully digested, because it was success achieved by surpassing the human-centric approach we preferred.
One thing to learn from the bitter lesson is the tremendous power of general methods — methods that continue to scale no matter how much computation becomes available. The two methods that appear to scale arbitrarily in this way are search and learning.
The second general point to learn from the bitter lesson is that the actual contents of minds are enormously, irredeemably complex. We should stop trying to think simply about the contents of minds — such as thinking simply about space, objects, multiple agents, symmetries. All of these are parts of the arbitrary and intrinsically complex external world. They are not things to be built in, because their complexity is endless. Instead, we should only build meta-methods that can find and capture this arbitrary complexity. The essential thing about these methods is that they should be able to find good approximations, but finding them should be done by our methods, not by us. We want AI agents that can discover like us, not AI agents that contain what we have discovered. Building in what we have discovered only makes it harder to understand how the discovery process itself can work.
Conclusion
In conclusion, the history of AI research delivers a clear message: general search and learning methods that leverage massive computation — not direct injection of human knowledge — ultimately bring success. Resisting the temptation of the human-centric approach that provides short-term satisfaction and focusing on scaling computation from a long-term perspective is the key to AI advancement. Rather than trying to directly teach the complex knowledge of the world, we should focus on building meta-methods that allow AI to discover and learn knowledge on its own.