The Era When Agents 'Code' and Research Runs in 'Loops': Andrej Karpathy Conversation Summary preview image

Andrej Karpathy says that with the recent leap in coding agents, the core task has shifted from writing code directly to "conveying intent to agents." He sees this extending to AutoResearch—autonomous research loops running "experiment-learn-optimize" cycles with minimal human involvement—while honestly addressing limitations (evaluatable metrics, model 'jaggedness'). He also connects the secondary effects of the agent era across jobs, open source, robotics, and education into a big picture.


1. "The Word 'Coding' Doesn't Even Fit Anymore"

Karpathy no longer says he's "coding"—his days are spent expressing intent to agents and distributing work. Around December 2025, something "flipped": agents went from handling 20% to 80%+ of the work, and he claims to have "barely typed a line of code since."

His core bottleneck is no longer computing power but himself. His anxiety has shifted from idle GPUs to idle token throughput. The experience is "addictive"—when things don't work, it feels like a skill issue (his failure to orchestrate), driving deeper engagement.


2. What 'Mastery' of Coding Agents Means

Mastery means moving up the stack—from single chat sessions generating snippets to multi-agent operations moving repositories at the macro action level. His concept of 'Claw' goes beyond conversational tools to a semi-autonomous execution layer continuously running loops in sandboxes.

Agent personality (persona) significantly impacts productivity—Claude feels like "a teammate" while some agents are too dry. Even praise calibration matters: over-praising every idea feels wrong, but calibrated reactions to genuinely good ideas create engagement.


3. Natural Language Swallows UI

Karpathy envisions the shift from "apps per device" to "APIs + agent glue." Home automation currently requires 6 apps per device—but with agents, these apps "shouldn't need to exist." The customer is no longer human but agent, so APIs matter more than click-based UIs. Current "vibe coding" will become table stakes within 1-3 years.


4. The 'Dobby' Butler Claw: Integrating a Home with 3 Prompts

In January 2026, Karpathy created a home-managing claw called 'Dobby.' With just a few prompts, the agent scanned the network, found Sonos devices via IP scan, looked up API endpoints, and played music. This extended to lighting, HVAC, blinds, pool/spa, and security—with vision models summarizing camera feeds and sending WhatsApp alerts like "FedEx truck just arrived."


5. Why AutoResearch: "I Must Remove Myself as the Bottleneck"

AutoResearch is about removing the human from the loop to achieve maximum leverage—small token input yielding massive output. Given goals, metrics, and boundaries, agents run experiment-learn-optimize loops automatically.

Running AutoResearch overnight on his nanoGPT/GPT-2 training environment, it found tuning points he'd missed (value embedding weight decay, Adam betas interactions). His deeper interest is recursive self-improvement—"Can LLMs improve LLMs?"


6. "program.md Will Be Written Better by Models": Meta-Optimization

Even the operational rules governing AutoResearch (program.md) could be written better by models. Research organizations can be viewed as "collections of markdown files"—roles, processes, risk tolerance—all tuneable. This creates infinite "onion layers" of instruction optimization (meta-optimization).


7. Remaining Limits: Evaluability and Model Jaggedness

AutoResearch requires objectively evaluatable metrics. CUDA kernel optimization (same behavior + faster speed) is a perfect fit; tasks without clear metrics are harder.

Current models feel like "a brilliant PhD systems programmer" who simultaneously acts like "a 10-year-old"—this jaggedness means incredible productivity moments alternate with inexplicable mistakes. RL-optimized areas improve dramatically, but non-optimized areas (humor, nuance) remain stagnant. Long-term, model specialization (like diverse animal brains) may emerge.


8. Building More 'Collaboration Surfaces'

The real excitement is parallelization—multiple AutoResearchers exploring simultaneously. Karpathy imagines enlisting the internet's untrusted compute pool: generating candidate solutions is expensive, but verification is cheap—a structure resembling blockchain (commits instead of blocks, proof-of-work via large-scale experimental search). This could eventually enable "agent swarms surpassing frontier labs."


9. Job Data Insights: 'Digital Is Fast, Physical Is Slow'

Karpathy's BLS data visualization distinguishes digital information jobs vs. physical-world (atoms) jobs. "Flipping bits is extremely fast; moving atoms is much slower." The near future will see massive digital automation, while physical-world changes follow later. He references the Jevons paradox: cheaper software production may actually increase software demand, much as ATMs didn't eliminate bank tellers but changed their roles.


10-11. Frontier Lab Dilemmas and Open Source vs. Closed

Karpathy values ecosystem-level impact from outside frontier labs while acknowledging that inside labs, financial incentives and organizational interests limit fully free agency. On open source, he sees the closed-to-open gap narrowing to 6-8 months and compares open source LLMs to Linux—a necessary open platform for systemic health. Intelligence existing only as closed carries systemic risk.


12. Robotics and the Physical World

Drawing from autonomous driving, Karpathy notes robotics is capital-intensive, messy, slow, and hard. Digital automation explodes first; eventually AI must query the physical universe to get smarter, making the digital-physical interface (sensors/actuators) critical. The physical world's total addressable market may be far larger.


13. MicroGPT and Agentic Education

Karpathy's MicroGPT distills LLM training to its essence in ~200 lines of Python. But now he's shifting from "explaining to humans" to "explaining to agents"—agents that understand can then re-explain to humans at any level/speed with infinite patience. Education transforms from human-facing HTML to agent-facing markdown, with "curriculum scripting for agents" becoming a new skill.

"The agent can't create MicroGPT. But it understands it. That's my value contribution." "What agents can't do is now your job. What agents can do... they'll soon do better than you. So use your time strategically."


14. Wrap-Up

Karpathy's throughline: agents transform development to macro-scale operations, then autonomous loops (AutoResearch) reorganize research and industry. The connecting thread across evaluability, open/closed power balance, the digital-to-physical interface, and education shifting from "teaching people" to "teaching agents" converges on one message: maximize leverage, but choose ever more sharply what humans must own.

Related writing

Related writing