Building Effective AI Agents — Anthropic

1. Introduction: Simplicity Is Power ✨

Over the past year, Anthropic has worked alongside dozens of teams across various industries to build large language model (LLM) agents. The most successful implementations shared a common thread: rather than relying on complex frameworks or specialized libraries, they used simple, composable patterns.

"The most successful implementations didn't use complex frameworks. Instead, they used simple, composable patterns."

This piece shares the hard-won practical knowledge Anthropic gained from building agents directly with customers, along with actionable advice developers can use to build effective agents.

2. What Is an Agent? 🤖

"Agent" can mean different things depending on who you ask.

Some customers use the term to describe systems that operate fully autonomously, performing complex tasks with a variety of tools.
Others use it to describe more prescriptive implementations that follow a predefined workflow.

Anthropic classifies all of these variants as agentic systems, but distinguishes between workflows and agents as follows:

Workflows: Systems in which LLMs and tools are orchestrated through predefined code paths
Agents: Systems in which the LLM dynamically decides its own process and tool usage, directly controlling how it carries out tasks

"Workflows follow predefined paths, while agents design their own process and use tools on the fly."

3. When Should You Use an Agent — and When Should You Avoid One? 🤔

When building LLM-powered apps, it's best to start with the simplest solution and add complexity only when necessary. Agentic systems introduce additional latency and cost, but improve task performance. Whether that tradeoff is worth it deserves careful thought.

Workflows: Best for well-defined tasks where predictable and consistent results are required
Agents: Best for cases requiring flexibility and model-driven decision-making at scale

"In most cases, a single LLM call augmented with retrieval and in-context examples is enough."

4. Frameworks: When and How to Use Them 🛠️

Many frameworks make it easier to implement agentic systems.

LangGraph (LangChain)
Amazon Bedrock AI Agent framework
Rivet (drag-and-drop GUI LLM workflow builder)
Vellum (GUI tool for building and testing complex workflows)

These frameworks simplify low-level operations such as LLM calls, tool definition/parsing, and call chaining. However, additional abstraction layers can obscure actual prompts and responses, making debugging harder and adding unnecessary complexity.

"When using a framework, always understand what's happening under the hood. Incorrect assumptions about internals are a common source of customer errors."

Tip: When possible, try using the LLM API directly — many patterns can be implemented in just a few lines of code. If you do use a framework, make sure you understand its internals before relying on it.

5. Building Blocks and Workflow Patterns for Agentic Systems 🧩

5-1. The Core Building Block: The Augmented LLM

The heart of any agentic system is an LLM augmented with retrieval, tools, and memory. Such an LLM can generate its own search queries, select appropriate tools, and decide what information to retain.

"The key is to tailor these augmentations to your use case and provide well-documented interfaces that make them easy for the LLM to use."

Practical tip: Using Anthropic's Model Context Protocol, a simple client implementation can integrate with a wide variety of third-party tools.

5-2. Workflow Patterns

(1) Prompt Chaining

Definition: Break a task into sequential steps, where each LLM call receives the output of the previous one
Benefit: Each LLM call handles an easier sub-task, improving accuracy (at the cost of higher latency)
Examples:
- Generate marketing copy → translate it
- Write a document outline → verify it meets criteria → write the full body

"Prompt chaining is ideal when a task can be cleanly broken into fixed sub-tasks."

(2) Routing

Definition: Classify input and direct it to a specialized downstream process
Benefit: Enables optimized prompts and tools for each input type
Examples:
- Route customer inquiries (general, refunds, technical support, etc.) to different processes/tools
- Route easy questions to a cheaper model (Claude 3.5 Haiku) and hard questions to a more capable one (Claude 3.5 Sonnet)

"Routing works well for complex tasks where inputs fall into clearly distinct categories."

(3) Parallelization

Definition: Multiple LLMs process tasks concurrently and their outputs are aggregated
Variants:
- Sectioning: Run independent sub-tasks in parallel
- Voting: Run the same task multiple times to get diverse results and aggregate them
Examples:
- One model handles a user query while another filters for inappropriate content (sectioning)
- Run code vulnerability review with multiple prompts and flag if any find an issue (voting)

"Parallelization is useful when speed matters or when diverse perspectives and attempts are valuable."

(4) Orchestrator-Workers

Definition: A central LLM (orchestrator) dynamically breaks down a task and assigns sub-tasks to worker LLMs, then synthesizes the results
Benefit: Well-suited for complex tasks where sub-tasks cannot be predicted in advance
Examples:
- Coding tasks that require modifying multiple files simultaneously
- Research tasks that gather and analyze information from diverse sources

"The orchestrator-workers pattern is defined by flexibility — sub-tasks are determined dynamically based on input."

(5) Evaluator-Optimizer

Definition: One LLM generates output while another evaluates and provides feedback, iterating until satisfied
Benefit: Effective when clear evaluation criteria exist and iterative improvement adds value
Examples:
- Refining subtle nuances in literary translation
- Iteratively gathering and analyzing information in complex research tasks

"This pattern mirrors how a human writer revises and polishes their work."

5-3. Agents

As LLMs mature in areas like understanding complex inputs, reasoning and planning, tool use, and error recovery, agents are appearing in production more and more.

How they work:
1. Begin with a user command or conversation
2. Once the task is clear, the agent independently plans and executes
3. Requests user feedback mid-task when needed
4. At each step, obtains "ground truth" from the environment (tool call results, code execution, etc.) to assess progress
5. Terminates when complete or upon a stop condition (e.g., a maximum number of iterations)

"Agents leverage tools to receive environmental feedback, loop through tasks to completion. Tool design and documentation are critically important."

When to use them:
- Suited for open-ended problems where the number of required steps cannot be predicted and cannot be hardcoded into a fixed path
- Ideal in trusted environments where the LLM must make independent judgments and take actions multiple times
Caution:
- Costs are higher and errors can compound
- Test thoroughly in a sandbox environment and put appropriate guardrails in place
Examples:
- A coding agent that modifies multiple files (SWE-bench tasks)
- A demo of Claude using a computer to perform tasks (computer use reference)

6. Combining and Customizing Patterns 🧬

These building blocks and workflows are not prescriptions — they are composable patterns. Mix and adapt them freely to fit each use case. The key is to measure performance and iteratively improve your implementation.

"Only add complexity when it demonstrably improves outcomes."

7. Summary and Practical Principles 📝

Success in the LLM space isn't about building the most complex system — it's about building the right system for your specific needs.

Start with simple prompts.
Optimize through evaluation.
Only introduce multi-step agents when simpler solutions fall short.

Anthropic's three core principles:

Maintain simplicity
Ensure transparency (make the agent's planning steps clearly visible)
Carefully document and test the agent-computer interface (ACI)

"Frameworks help you get started quickly, but in production it can be better to reduce abstraction layers and build from basic components."

8. Real-World Examples (Appendix 1) 💡

A. Customer Support

Conversational chatbot + tool integration = a natural fit for agents
Draws on external data: customer records, order history, knowledge bases, etc.
Capable of programmatic actions like processing refunds and updating tickets
Success criteria are clear (e.g., whether the issue was resolved)

"Several companies have validated agent effectiveness through a model that charges only for successfully resolved cases."

B. Coding Agents

Code solutions can be verified with automated tests
Test results serve as feedback for iterative improvement
The problem space is well-defined and structured
Objective quality measurement is possible

"Anthropic's agents can now resolve GitHub issues from pull request descriptions alone. But automated tests alone aren't enough — human review to confirm alignment with overall system requirements remains essential."

9. Tool Prompt Engineering (Appendix 2) 🛠️

Tools are critically important in agentic systems. When Claude interacts with external services and APIs, it requires precise structure and definitions to be specified in the API.

Tool definitions and specifications deserve as much attention as prompts
The same task can be specified in multiple ways (e.g., diff vs. full file rewrite, Markdown vs. JSON)
Choose a format that is easy for the LLM to work with

Tool design tips:

Give the model enough tokens to "think" before acting
Use formats it would naturally encounter on the internet
Minimize unnecessary formatting burdens (e.g., counting lines, escaping strings)

"A good tool definition clearly specifies examples, edge cases, input formats, and boundaries with other tools."

Rename parameters and descriptions to be more explicit (as if explaining to a junior developer)
Test with diverse inputs and iteratively fix where the model makes mistakes
Poka-yoke: Design tools to make mistakes difficult

"During SWE-bench agent development, we spent more time optimizing tools than writing the overall prompt. For example, switching from relative to absolute paths — where the model kept making mistakes — led to perfect usage immediately after."

10. Closing Thoughts 🎉

The key to building effective AI agents is not complexity — it's simplicity, transparency, and reliability tailored precisely to your problem.

Keep things simple.
Design for transparency.
Carefully document and test your tools and interfaces.

With these principles, you can build agents that are both powerful and trustworthy! 🚀

"Success lies not in building the most complex system, but in building the right system for your needs."

1. Introduction: Simplicity Is Power ✨

"The most successful implementations didn't use complex frameworks. Instead, they used simple, composable patterns."

This piece shares the hard-won practical knowledge Anthropic gained from building agents directly with customers, along with actionable advice developers can use to build effective agents.

2. What Is an Agent? 🤖

"Agent" can mean different things depending on who you ask.

Some customers use the term to describe systems that operate fully autonomously, performing complex tasks with a variety of tools.
Others use it to describe more prescriptive implementations that follow a predefined workflow.

Anthropic classifies all of these variants as agentic systems, but distinguishes between workflows and agents as follows:

Workflows: Systems in which LLMs and tools are orchestrated through predefined code paths
Agents: Systems in which the LLM dynamically decides its own process and tool usage, directly controlling how it carries out tasks

"Workflows follow predefined paths, while agents design their own process and use tools on the fly."

3. When Should You Use an Agent — and When Should You Avoid One? 🤔

Workflows: Best for well-defined tasks where predictable and consistent results are required
Agents: Best for cases requiring flexibility and model-driven decision-making at scale

"In most cases, a single LLM call augmented with retrieval and in-context examples is enough."

4. Frameworks: When and How to Use Them 🛠️

Many frameworks make it easier to implement agentic systems.

LangGraph (LangChain)
Amazon Bedrock AI Agent framework
Rivet (drag-and-drop GUI LLM workflow builder)
Vellum (GUI tool for building and testing complex workflows)

"When using a framework, always understand what's happening under the hood. Incorrect assumptions about internals are a common source of customer errors."

5. Building Blocks and Workflow Patterns for Agentic Systems 🧩

5-1. The Core Building Block: The Augmented LLM

"The key is to tailor these augmentations to your use case and provide well-documented interfaces that make them easy for the LLM to use."

Practical tip: Using Anthropic's Model Context Protocol, a simple client implementation can integrate with a wide variety of third-party tools.

5-2. Workflow Patterns

(1) Prompt Chaining

Definition: Break a task into sequential steps, where each LLM call receives the output of the previous one
Benefit: Each LLM call handles an easier sub-task, improving accuracy (at the cost of higher latency)
Examples:
- Generate marketing copy → translate it
- Write a document outline → verify it meets criteria → write the full body

"Prompt chaining is ideal when a task can be cleanly broken into fixed sub-tasks."

(2) Routing

Definition: Classify input and direct it to a specialized downstream process
Benefit: Enables optimized prompts and tools for each input type
Examples:
- Route customer inquiries (general, refunds, technical support, etc.) to different processes/tools
- Route easy questions to a cheaper model (Claude 3.5 Haiku) and hard questions to a more capable one (Claude 3.5 Sonnet)

"Routing works well for complex tasks where inputs fall into clearly distinct categories."

(3) Parallelization

Definition: Multiple LLMs process tasks concurrently and their outputs are aggregated
Variants:
- Sectioning: Run independent sub-tasks in parallel
- Voting: Run the same task multiple times to get diverse results and aggregate them
Examples:
- One model handles a user query while another filters for inappropriate content (sectioning)
- Run code vulnerability review with multiple prompts and flag if any find an issue (voting)

"Parallelization is useful when speed matters or when diverse perspectives and attempts are valuable."

(4) Orchestrator-Workers

Definition: A central LLM (orchestrator) dynamically breaks down a task and assigns sub-tasks to worker LLMs, then synthesizes the results
Benefit: Well-suited for complex tasks where sub-tasks cannot be predicted in advance
Examples:
- Coding tasks that require modifying multiple files simultaneously
- Research tasks that gather and analyze information from diverse sources

"The orchestrator-workers pattern is defined by flexibility — sub-tasks are determined dynamically based on input."

(5) Evaluator-Optimizer

Definition: One LLM generates output while another evaluates and provides feedback, iterating until satisfied
Benefit: Effective when clear evaluation criteria exist and iterative improvement adds value
Examples:
- Refining subtle nuances in literary translation
- Iteratively gathering and analyzing information in complex research tasks

"This pattern mirrors how a human writer revises and polishes their work."

5-3. Agents

As LLMs mature in areas like understanding complex inputs, reasoning and planning, tool use, and error recovery, agents are appearing in production more and more.

How they work:
1. Begin with a user command or conversation
2. Once the task is clear, the agent independently plans and executes
3. Requests user feedback mid-task when needed
4. At each step, obtains "ground truth" from the environment (tool call results, code execution, etc.) to assess progress
5. Terminates when complete or upon a stop condition (e.g., a maximum number of iterations)

"Agents leverage tools to receive environmental feedback, loop through tasks to completion. Tool design and documentation are critically important."

When to use them:
- Suited for open-ended problems where the number of required steps cannot be predicted and cannot be hardcoded into a fixed path
- Ideal in trusted environments where the LLM must make independent judgments and take actions multiple times
Caution:
- Costs are higher and errors can compound
- Test thoroughly in a sandbox environment and put appropriate guardrails in place
Examples:
- A coding agent that modifies multiple files (SWE-bench tasks)
- A demo of Claude using a computer to perform tasks (computer use reference)

6. Combining and Customizing Patterns 🧬

"Only add complexity when it demonstrably improves outcomes."

7. Summary and Practical Principles 📝

Success in the LLM space isn't about building the most complex system — it's about building the right system for your specific needs.

Start with simple prompts.
Optimize through evaluation.
Only introduce multi-step agents when simpler solutions fall short.

Anthropic's three core principles:

Maintain simplicity
Ensure transparency (make the agent's planning steps clearly visible)
Carefully document and test the agent-computer interface (ACI)

"Frameworks help you get started quickly, but in production it can be better to reduce abstraction layers and build from basic components."

8. Real-World Examples (Appendix 1) 💡

A. Customer Support

Conversational chatbot + tool integration = a natural fit for agents
Draws on external data: customer records, order history, knowledge bases, etc.
Capable of programmatic actions like processing refunds and updating tickets
Success criteria are clear (e.g., whether the issue was resolved)

"Several companies have validated agent effectiveness through a model that charges only for successfully resolved cases."

B. Coding Agents

Code solutions can be verified with automated tests
Test results serve as feedback for iterative improvement
The problem space is well-defined and structured
Objective quality measurement is possible

"Anthropic's agents can now resolve GitHub issues from pull request descriptions alone. But automated tests alone aren't enough — human review to confirm alignment with overall system requirements remains essential."

9. Tool Prompt Engineering (Appendix 2) 🛠️

Tools are critically important in agentic systems. When Claude interacts with external services and APIs, it requires precise structure and definitions to be specified in the API.

Tool definitions and specifications deserve as much attention as prompts
The same task can be specified in multiple ways (e.g., diff vs. full file rewrite, Markdown vs. JSON)
Choose a format that is easy for the LLM to work with

Tool design tips:

Give the model enough tokens to "think" before acting
Use formats it would naturally encounter on the internet
Minimize unnecessary formatting burdens (e.g., counting lines, escaping strings)

"A good tool definition clearly specifies examples, edge cases, input formats, and boundaries with other tools."

Rename parameters and descriptions to be more explicit (as if explaining to a junior developer)
Test with diverse inputs and iteratively fix where the model makes mistakes
Poka-yoke: Design tools to make mistakes difficult

"During SWE-bench agent development, we spent more time optimizing tools than writing the overall prompt. For example, switching from relative to absolute paths — where the model kept making mistakes — led to perfect usage immediately after."

10. Closing Thoughts 🎉

The key to building effective AI agents is not complexity — it's simplicity, transparency, and reliability tailored precisely to your problem.

Keep things simple.
Design for transparency.
Carefully document and test your tools and interfaces.

With these principles, you can build agents that are both powerful and trustworthy! 🚀

"Success lies not in building the most complex system, but in building the right system for your needs."

1. Introduction: Simplicity Is Power ✨

2. What Is an Agent? 🤖

3. When Should You Use an Agent — and When Should You Avoid One? 🤔

4. Frameworks: When and How to Use Them 🛠️

5. Building Blocks and Workflow Patterns for Agentic Systems 🧩

5-1. The Core Building Block: The Augmented LLM

5-2. Workflow Patterns

(1) Prompt Chaining

(2) Routing

(3) Parallelization

(4) Orchestrator-Workers

(5) Evaluator-Optimizer

5-3. Agents

6. Combining and Customizing Patterns 🧬

7. Summary and Practical Principles 📝

8. Real-World Examples (Appendix 1) 💡

A. Customer Support

B. Coding Agents

9. Tool Prompt Engineering (Appendix 2) 🛠️

10. Closing Thoughts 🎉

Related writing

Inside YC's AI Playbook

AI Arbitrage in the Age of Bots

AX Roadmap That Leads to Results: From Individual Efficiency to Org Productivity

Reading

1. Introduction: Simplicity Is Power ✨

2. What Is an Agent? 🤖

3. When Should You Use an Agent — and When Should You Avoid One? 🤔

4. Frameworks: When and How to Use Them 🛠️

5. Building Blocks and Workflow Patterns for Agentic Systems 🧩

5-1. The Core Building Block: The Augmented LLM

5-2. Workflow Patterns

(1) Prompt Chaining

(2) Routing

(3) Parallelization

(4) Orchestrator-Workers

(5) Evaluator-Optimizer

5-3. Agents

6. Combining and Customizing Patterns 🧬

7. Summary and Practical Principles 📝

8. Real-World Examples (Appendix 1) 💡

A. Customer Support

B. Coding Agents

9. Tool Prompt Engineering (Appendix 2) 🛠️

10. Closing Thoughts 🎉

Related writing

Inside YC's AI Playbook

AI Arbitrage in the Age of Bots

AX Roadmap That Leads to Results: From Individual Efficiency to Org Productivity