On Saturday morning, December 27, 2025, this episode revisits the year's seismic shifts in the AI ecosystem driven by DeepSeek and the mainstream adoption of MoE (Mixture of Experts) and RLVR (Reinforcement Learning with Verifiable Reward). It analyzes how China-led open frontier models surged forward and how the "recipe" spread, while making in-depth predictions about what scale-up and autonomous agents will bring in 2026.
1. 2025 in Review: The Paradigm Shift DeepSeek Launched
2025 began with the shock that DeepSeek brought. The R1 model and the DeepSeek moment went beyond a simple technical release — they completely transformed the paradigm of AI model development. Sunghyun Kim reflects that 2025, after the initial seismic shock, was a period of "incremental progress" — understanding and building upon the new paradigm.
The most striking change was the era of China-led open frontier models. Until 2024, the trend was building efficient "reasonably sized models" within limited resources, but in 2025, every company began aiming for frontier (top-tier performance).
"Countless Chinese companies — DeepSeek, MiniMax, Z.ai, Xiaomi, Tencent, Moonshot, and more — released models. Previously they were around 70B scale, but the models released in 2025 were nearly all frontier or near-frontier class. The biggest change of 2025 is that almost every company is now aiming for frontier."
"What's interesting is that these are all Chinese models. Outside of China, there were hardly any impressive open models that could be called frontier. Llama 4 existed but didn't leave a strong impression."
Once DeepSeek proved that frontier-level performance was achievable even with relatively limited compute power (GPUs, etc.), Chinese companies rushed to compete in building bigger, more powerful models.
2. The Democratization of MoE and the Efficiency Revolution
One of the most important keywords of 2025 was undoubtedly MoE (Mixture of Experts). The MoE architecture that DeepSeek established has become the industry-standard "recipe."
2.1. MoE's Overwhelming Efficiency (Compute Multiplier)
Sunghyun Kim explains MoE's advantages through graphs showing performance relative to training compute. The key lies in sparsity.
"At around 10^24 compute, MoE models perform over 7x better than standard dense models. In other words, using MoE with the same compute yields performance equivalent to pouring 7x the compute into a dense model."
"What's even more shocking is that this multiplier grows as training compute increases. The gap keeps widening, so not using MoE has become the strange choice."
2.2. Modularity and Routing
MoE activates only a subset of its many expert modules needed for processing specific tokens. This means the total parameter count is very large, but the actually used parameters remain small, maximizing efficiency.
"The architecture DeepSeek designed has become the base architecture, much like the Llama architecture was for the previous generation. Even companies like Moonshot say 'there's no need to improve on the DeepSeek architecture — just use it as is.' It's become the answer key."
3. The Rise of RLVR and Agent Post-Training
Another key development of 2025 was RLVR (Reinforcement Learning with Verifiable Reward). This technique effectively revealed the secret behind the o1 model, dramatically improving how models reason and use tools.
3.1. RLVR as the Standard for Agent Training
While RLHF (Reinforcement Learning from Human Feedback) was previously used to build chatbots, RLVR is now used to build agents.
"If RLHF was post-training for building chatbots, RLVR can be seen as post-training for building agents."
3.2. Outcome-Based Evaluation
The core of RLVR is rewarding based on final outcomes, not process. For a coding agent, for example, you don't micromanage which tools the model uses or how — you only check whether the final code passes unit tests.
"The paradigm shifted to setting aside how the model uses tools and evaluating only the final result. If a verifiable result comes out, like passing unit tests, you give a reward. Through this process, the model learns on its own to become an agent."
4. New Understanding of RL: Atomic Skills and Compositional Abilities
2025 was also a year of deepening understanding of reinforcement learning (RL). A new perspective emerged that RL doesn't just unlock hidden capabilities — it enables entirely new dimensions of ability.
4.1. Atomic Skills vs. Compositional Abilities
Analysis of what models learn through RL led to the conclusion that it's the ability to compose.
- Atomic Skills: Individual foundational abilities like arithmetic operations. These are learned during pre-training.
- Compositional Abilities: The ability to weave foundational skills together to solve complex problems. These are learned through RL.
"Atomic skills like arithmetic are learned during pre-training, but the ability to compose these skills in the right order to solve new problems — that 'compositional ability' — can be learned through RL."
"This led to approaches like 'let's firmly inject the atomic skills needed for agents during mid-training, then RL can compose them.'"
4.2. Advances in RL Infrastructure
The ability to build RL infrastructure — where training, sampling, and environment interactions are complexly intertwined — quickly and accurately has become a core competitive advantage.
5. Tacit Knowledge and the Data War Beyond Published Papers
Despite DeepSeek publishing its recipe, there was discussion about why only China and the US are leading. The conclusion pointed to tacit knowledge and data that don't appear in papers.
"Things like hyperparameter settings and infrastructure know-how don't show up well in papers. Especially how to create data for post-training is even more hidden knowledge. This data creation know-how ultimately determines the quality of the final product."
"There's a saying: 'The model is the product, and the data is the model.' Using open models for data processing and such — data itself being the most important issue hasn't changed."
6. 2026 Outlook Part 1: Scale-Up and the Data Bottleneck
Now the focus turns to 2026. If 2025 validated efficiency, 2026 is predicted to bring another round of massive scale-up.
6.1. Bigger Models, More Training
Current frontier models often have active parameters under 100B despite large total parameter counts. Researchers are asking: "If it works this well at this scale, what happens if we go bigger?"
"Chinese companies all want to scale up too. 'What if we do pre-training at a larger scale?' DeepSeek V3 showed that RLVR works even better as the model gets bigger. In 2026, we'll definitely see models that are larger and trained longer."
6.2. The Long-Tail Data Problem
But the biggest obstacle is data. Just as going from 99% to 99.9% is the hardest part of autonomous driving, AI models need quality data including edge cases to perform more complex and sophisticated tasks.
"Frontier companies are spending enormous resources creating data right now, and at some point they'll think 'how long do we have to keep doing this?' Like autonomous driving — continuously collecting countless edge case data to reach 99.9% — that's the biggest bottleneck right now."
7. 2026 Outlook Part 2: New Paradigms and Autonomous Agents
Beyond simple performance improvements, 2026 is likely to see qualitatively different new paradigms emerge. (Sunghyun Kim put the probability at 50%.)
7.1. Truly Autonomous Agents
Current coding agents work when told to and have their results reviewed. But future agents will autonomously improve code and drive projects forward.
"An agent that optimizes code and adds features on its own overnight without being told — when this arrives, the economic value will be qualitatively different. Just as humans create real impact when they work with autonomy, models will too when they gain autonomy."
7.2. Continual Learning
Rather than humans spoon-feeding data, this is the stage where models find and learn what to study on their own.
"Models discovering and learning data on their own — that's continual learning. It's not just continuously feeding them data, but the model's ability to discover 'what to learn and why' on its own will be key."
7.3. Self-Play and Intrinsic Motivation
Implementing self-play in domains like math or coding — unlike games like Go with clear win/lose outcomes — is extremely difficult. It's not just about creating hard problems, but about creating problems that are meaningful and interesting to humans.
"Ultimately this comes down to 'intrinsic motivation' and 'human alignment.' We need to give models the motivation to find valuable problems that humans find interesting, and that motivation must be aligned with human values."
Conclusion: Embracing Uncertainty
Amid concerns about whether current AI investment is a bubble, there was consensus that 2026 needs a breakthrough technological leap to prove its worth. Like the Manhattan Project or Apollo Program, a massive race is underway where whoever succeeds first could take it all.
"The odds of what will happen in 2026 are fifty-fifty, but I've decided to just enjoy it. The future is becoming unpredictable, but focusing on technological progress itself has its own kind of joy." — Sunghyun Kim
"There's a high probability that everything we learned in 2025 — 'oh, this is how you do it' — we'll have to unlearn and relearn in 2026. But that change gets the dopamine flowing and fills me with anticipation." — Seungjun Choi
The changes DeepSeek launched in 2025 may be just the beginning. As we look ahead to 2026 and the next AI leap that scale-up and new paradigms will bring, we wrap up this summary of EP.81.
