In this talk, Jared Zoneraich, founder and CEO of PromptLayer, shares an independent analysis of Claude's code generation architecture and implementation, while also covering the features and future direction of other coding agents in depth. He particularly emphasizes the importance of simple architecture and better models, and provides practical insights for developers building their own AI agents.
1. Introduction and About Jared Zoneraich
This talk isn't officially sponsored by Anthropic — rather, Jared shares his independent findings about how Claude Code works under the hood. Jared describes himself as a "hacker" and has been building an AI engineering workbench through PromptLayer for the past 2.5 years. PromptLayer advocates for rigorous prompt engineering and agent development, believing that AI product development requires the participation of product teams, engineering teams, and domain experts together.
"This is a talk about how Claude Code works. Again, I'm not affiliated with Anthropic. They don't pay me. I'd take the money if they offered it."
Jared is an enthusiastic fan of Claude Code and has reorganized PromptLayer's engineering organization around it. As a small team struggling with countless edge cases and data upload issues, they established the rule "if it can be done within an hour, just use Claude Code" — dramatically boosting productivity.
2. The Evolution of Coding Agents and the Emergence of Claude Code
Coding agents have seen explosive development recently. In the past, autonomous coding agents didn't work well, disappointing many developers, but now they're undergoing tremendous change. Jared outlines this evolution:
- Early days: "As everyone knows, initially it was about copying and pasting code from ChatGPT. Even that alone was revolutionary."
- Cursor's debut: Early Cursor was "just a VS Code fork with a
command Kfeature," but developers loved it. - Cursor Assistant: Conversational agent capabilities were added.
- Claude Code: Now offers a "headless" workflow where you can accomplish tasks without touching code directly — a pivotal advancement.
"Claude Code represents a new workflow where you don't even need to touch the code. It'd better be really good."
3. Claude Code's Secret to Success: Simple Architecture and Better Models
Jared boils down why Claude Code excels to two things:
3.1. Simple Architecture
Claude Code's core philosophy is "give it tools and then get out of the way." Previously, developers built complex prompt chains to prevent model hallucinations, but that's no longer necessary.
"Less scaffolding, more model — that's the key."
Models are optimized for tool calling and are learning and improving on their own. Developers should stop over-compensating for model limitations and let the model explore and solve problems autonomously.
"Don't over-engineer around the model's current flaws. Most issues will resolve themselves, and you'll just be wasting time."
3.2. Better Models
Claude Code's advancement is primarily because Anthropic shipped better models. These models are skilled at tool calling and optimized for autonomous execution. Claude Code eschews complex RAG, embeddings, and classifiers in favor of simply "building better models and letting models learn on their own."
Like Python's "Zen of Python," Claude Code's architecture emphasizes simplicity and clarity.
"Simple is better than complex, complex is better than complicated, flat is better than nested. That's this entire talk. Everything you need to know about how Claude Code works and why it works."
4. Core Components and How Claude Code Works
Jared explains Claude Code's specific components one by one.
4.1. The Constitution
Claude Code's constitution takes the form of markdown files like claude MD codex or agents MD. Rather than requiring models to study repositories or build complex vector databases, users can directly edit these instructions as needed, and agents can modify them too — a very simple approach.
"Ultimately, everything is prompt engineering, or context engineering. How do you adapt these general-purpose models for your use case? And I think the simplest answer is the best."
4.2. The Master Loop
At the heart of Claude Code is a simple master loop. Unlike past complex agents, all modern coding agents, including Claude Code, consist of a single while loop that calls tools, feeds results back to the model, and repeats until there are no more tool calls.
"While there are tool calls, execute the tools, give the tool results to the model, repeat until there are no tool calls, then ask the user what to do."
The model is remarkably good at knowing when to keep calling tools and when to correct mistakes. This flexibility shows that the more you trust the model to explore and solve problems, the more powerful the system becomes.
4.3. Core Tools
Jared introduces several core tools he finds most interesting in Claude Code. These tools can change daily, but fundamentally mimic the actions humans use when troubleshooting in a terminal.
- Read: Reads file contents. Token limits may prevent reading files that are too large.
- Grep / Glob: Uses traditional search tools like
grepinstead of vector databases. This contradicts RAG wisdom but works effectively for general agents. - Edit: Uses a diff approach rather than rewriting entire files, applying only changes. This is much faster, reduces context usage, and lowers error rates.
- Bash: The most important and essential tool. Even if you removed all other tools and kept only
bash, it would still work — handling Python script execution, test generation, and everything else.bashbenefits from rich training data."Bash is everything. You could remove all these tools and just leave Bash."
- Web Search / Web Fetch: Search and external API calls are handled by cheaper, faster models to reduce the main loop's burden.
- To-dos: Helps the model track tasks and stay on target.
- Tasks: A context management tool that prevents context contamination when running long processes or reading files.
"The biggest enemy is context filling up and the model becoming stupid."
4.4. To-do Lists
To-do lists are an important feature that helps the model structure work and track progress. Interestingly, these lists are not structurally enforced but entirely prompt-based.
- Rules: Perform one task at a time, mark completed tasks, continue processing the current task if errors occur.
- Implementation: To-do lists are inserted into the system prompt, and models are now very adept at following these instructions.
- Structure: Features a structured schema including version, ID, todo title, and evidence.
- Benefits:
- Enforces planning: Encourages the model to plan ahead.
- Crash recovery: Work can resume after crashes.
- Enhanced UX: Provides visual progress to users, increasing transparency.
- Steerability: Helps users control the model's workflow more easily.
4.5. Async Buffer and Context Compression
Claude Code uses an async buffer called H2A to separate I/O processes from reasoning. This prevents all terminal information from being "stuffed" into the model, managing context efficiently. When context reaches capacity, the middle section is discarded and the beginning and end are summarized.
Additionally, sandbox environments enable long-term memory storage. Jared predicts all ChatGPT and Claude windows will offer sandboxes in the near future, enabling efficient handling of long-term research and document updates.
4.6. A World Without DAGs
Jared recalls building complex systems like customer support agents using DAGs (Directed Acyclic Graphs) with hundreds of nodes, branching prompt flows based on conditions. This was complex to develop and difficult to maintain.
"For the last two and a half years, everyone was building these DAGs. It was absolute madness. If this user wants a refund, send them to this prompt; if they want that, send them to that one — hundreds of nodes..."
Now models are smart enough that simple loops and tool calls work far better without complex DAGs. This makes development 10x easier and improves maintainability.
"The key is to rely on the model. When in doubt, don't try to think of every edge case or
ifstatement. Just let the model explore and figure it out."
Jared gives an example where he added unnecessary title tags to website buttons to help agent navigation, but the agent actually got "distracted" and produced worse results. This shows that excessive instructions or scaffolding can hinder the model's autonomous exploration ability.
4.7. Trigger Phases and Sandboxing
Claude Code adjusts reasoning budgets through trigger phases like 'think,' 'think hard,' and 'ultra think.' These serve as parameters that allow the model to adjust its own reasoning token budget.
Sandboxing and permission management are critical security concerns. Prompt injection from the internet or agents with shell access connecting to the web can create significant attack vectors. Claude Code manages this through URL blocking, sub-agent usage, and bash command gates. Jared jokes that he personally runs in "YOLO mode" but is obviously careful with enterprise customers.
4.8. Sub-agents
Sub-agents are an important method for solving context management challenges. Sub-agents for specific tasks have their own independent context, feeding only results back to the main agent to prevent context contamination.
- Examples: Researcher, docs reader, test runner, code reviewer.
- Structure: Sub-agents receive
descriptionandpromptthrough a tool call called Task. Interestingly, the coding agent can generate and pass prompts to other agents on its own."The coding agent provides prompts to its own agents. And I've actually used this paradigm in agents I've built myself."
- Benefits: Increases flexibility and helps the model self-resolve problems by providing more information when errors occur.
4.9. System Prompt and Skills
Claude Code's system prompt plays the role of subtly adjusting model behavior — instructing things like concise output, tool usage preference, matching existing code style, no comment additions, parallel command execution, and to-do list usage. These instructions guide specific behaviors based on experience-driven feedback.
Skills are like extensible system prompts. They provide additional information needed for specific tasks to Claude Code without overusing context.
- Examples: Documentation update skills (writing style, product info), Microsoft Office file editing skills, design style guide skills, deep research skills.
- Usage: Jared revealed he used slide development, deep research, and design skills to create his own presentation slides.
- Challenges: While very useful, it remains challenging for models to recognize and invoke skills at the appropriate time. Users sometimes need to manually trigger skills.
4.10. Unified Diffing
Unified diff is a crucial feature when applying code changes. Instead of rewriting entire files, it displays only changes in diff format, providing:
- Context savings: Far less token usage.
- Speed improvements: Tasks are faster.
- Error reduction: Effective at preventing mistakes (like how it's easier to mark corrections than rewrite an entire essay).
- Standard: Many coding agents follow the unified diff standard or similar approaches.
5. The Future of Coding Agents and Diverse Philosophies
Jared shares his views on the future of coding agents and the philosophies of various agents.
5.1. Future Direction Predictions
- Tool calling: Some predict a master loop with hundreds of tool calls, but Jared thinks the direction will be toward maintaining minimal tool calls (just a few including bash).
- Adaptive budgets: Budget-adjusting features like 'think' phases will advance further. Using reasoning models as tools — switching between fast/cheap and high-quality models situationally — will be useful.
- New paradigms: Like to-do lists and skills, undiscovered "first-class paradigms" are likely to emerge.
5.2. The AI Therapist Problem
Jared calls this the "AI therapist problem" — explaining that the best AI applications have no single global maximum. Just as different therapists have different strategies (meditation, CBT, ayahuasca), coding agents can have diverse design and architecture philosophies.
"This is also my counter-argument to AGI — when building applications, taste matters a lot and design architecture is very important."
Currently, with various coding agents like Claude Code, Codex, and Cursor, it's hard to say which is definitively best. Each may excel at specific tasks, and diverse philosophies will coexist, each optimized for different use cases.
5.3. Comparing Major Coding Agents' Philosophies
- Claude Code: User-friendliness and simplicity are its strengths. Excels in areas requiring human interaction like
gitoperations and local environment setup. - Codex: Excels at context management and feels powerful. Uses a Rust core and is open source. Its kernel-based sandboxing approach differs from Claude Code.
- Cursor IDE: Strengths are being model-agnostic and fast. Its
composermodel demonstrated that data-driven fine-tuning can build defensibility. - Factory Droid: Strong in specialized sub-agents.
- Devin: Focuses on end-to-end autonomy and self-reflection.
- AMP: Sourcegraph's coding agent offering a free tier and a model-agnostic approach. Instead of model selection, it focuses on accelerating development and building agent-friendly environments.
- Handoff: Instead of context summarization (compact), starts new threads passing only needed information — a fresh perspective on context management.
- Model choice: Uses different models labeled 'fast,' 'smart,' and 'oracle' to adjust reasoning intensity.
6. Evals and Future Workflows
Jared points out the reality that benchmarks have become marketing tools and that true evaluation is difficult work. The simple loop architecture that relies on model flexibility makes evaluation especially challenging.
6.1. Evaluation Methods
- End-to-End tests: Integration tests confirming whether problems were solved.
- Point-in-Time tests: Checking model behavior in situations requiring specific tool calls.
- Backtests: Capturing past data to rerun and evaluate agent performance.
- Agent smell: Surface-level metrics like number of tool calls, retry count, and time taken to check system health.
6.2. The Importance of Rigorous Tools
While leveraging model flexibility, tools themselves must be rigorously tested. By treating tools as functions with clear inputs and outputs and testing them as units, you can isolate and resolve issues arising from the model's non-deterministic aspects.
"It's a way to offload the model's non-determinism to other parts of the model. Test your tools."
Jared emphasized that for cases requiring very specific outputs — like particular email formats or blog posts — it's better to build rigorously testable tools rather than relying solely on model exploration.
6.3. Headless Claude Code SDK
Finally, Jared highlighted the potential of the Headless Claude Code SDK. This integrates Claude Code as part of a pipeline, enabling complex tasks with simple prompts.
- Example: Building automated workflows via GitHub Actions that update documentation daily, read commits to identify changes, check
claude MD, and create PRs. - Future: Going forward, agents will be built at higher abstraction levels, with agents like Claude Code handling much of the orchestration work.
7. Conclusion
Jared Zoneraich closed his talk with five key takeaways:
- Trust the model: When in doubt building agents, lean on the model.
- Simple design wins: Don't overcomplicate things — pursue simplicity.
- Bash is enough: Minimize tools and leverage powerful, general-purpose ones like
bash. - Context management matters: Context contamination makes models "stupid," so managing it efficiently is crucial.
- Diverse perspectives matter: There may be multiple "best" approaches to problem-solving, and various agent philosophies will coexist for different use cases.
Jared also revealed that his slide deck was created using Claude Code's skills (slide development, deep research, and design skills), once again demonstrating the practical potential of AI agents.
