This talk features Matt Carey of Cloudflare explaining the evolution of how AI agents use APIs and where things are headed. He points out the limitations of the traditional tool-bundling approach, then introduces the advantages of CLI, tool search, and code generation (codemode) for solving the context problem. He also covers the importance of safe code execution environments, how dynamic workers address that need, and what the future holds for API and client development.
1. Agents Meet APIs 🤝
Matt Carey works at Cloudflare on MCP (Mega Context Problem) and agent-related work, with a deep interest in how AI agents interact with the external world. Giving agents "hands" started with the concept of tool calling or function calling — the LLM writes a function, and the system executes it. Early on, each agent bundled its own tools directly, which was inefficient; every agent had to build its own tooling for the same APIs.
"How do you give an agent hands? How do you get it to interact with the outside world? You're probably familiar with this. This is tool calling, function calling."
Around April of last year, MCP and remote MCP concepts emerged, allowing service providers to offer standardized tools so that any agent could use the same APIs. But this approach had a serious problem of its own: context window explosion.
2. The Context Window Explosion Problem 💥
Cloudflare wanted agents to have access to their entire API surface, but simply turning every API endpoint into a tool would make the context window unmanageably large. Cloudflare's OpenAPI spec runs to 2.3 million tokens; converting it into tools required 1.1 million tokens — beyond what even the largest foundation models can handle.
"You'll think, 'I want agents to be able to access the full surface area of our API.' But that's not going to happen. Why not? The agent's context window is going to explode."
To work around this, Cloudflare split the API into several product-based MCP services. This reduced the context burden, but users had to manually select the services they needed, and coverage was often incomplete.
"This doesn't achieve the goal of 'make every API a tool for agents.' It's kind of annoying, actually."
What was really needed was progressive discovery of tools.
3. Three Ways to Solve the Context Problem 💡
Matt Carey proposes three approaches for solving the context problem.
3.1. CLI (Command Line Interface) 💻
A CLI lets agents run commands in a sandbox environment and use --help flags to discover parameters and capabilities on their own. This approach has gained traction in places like Open Claw, but it requires shell access, which is a meaningful constraint.
3.2. Tool Search 🔍
Services like Claude Code use tool search: they analyze the user's query as keywords and load only the relevant tools into context. For example, "I want to create a worker" would add just a handful of related tools to context. This reduces context size effectively, though some irrelevant tools may still slip in.
"Only load the tools that are relevant."
3.3. Code Generation (Codemode) ✍️ (highlighted)
This is the approach Cloudflare introduced in a blog post last year. Instead of forcing agents to use a CLI or static search tools, let the agent write code directly. By leveraging TypeScript's strong type system to express API inputs and outputs concisely, the model generates code from those types.
"Have the model write code. You benefit as the model gets smarter, and you benefit as the OpenAPI spec improves. That should be the source of truth."
As examples, he shows agents generating code to list workers, deploy workers, and configure security via Cloudflare Access. He argues this approach will become even more effective as models grow more capable.
4. The Challenge of Code Generation: Safe Code Execution 🔒
Code generation is powerful, but executing untrusted code raises a fundamental security problem. A few years ago this would have been unthinkable — it would have been classified as a vulnerability outright. Malicious or buggy code could access the filesystem, leak secrets, spin into an infinite loop wasting resources, or even attempt to mine cryptocurrency.
Historically, attempts to address this included DSLs, sandbox environments (VMs, etc.), and mandatory code review. But Matt Carey introduces Cloudflare's powerful answer: Cloudflare Workers.
"Luckily, we have a pretty cool primitive to solve this problem. And there are probably others, but this is the first one and I think it's worth calling out."
Cloudflare Workers use V8 engine isolates to execute code, providing a highly secure sandbox. In his demo, Matt shows:
- Blocking secret access: attempts to read secrets via
process.envfail. - Programmable sandbox: with Node.js compatibility mode off,
process.envdoesn't exist at all — errors are thrown — demonstrating that the environment can be controlled programmatically. - External network access control: by default, internet access is blocked when an agent tries to call an external API, but administrators can configure programmable guardrails that allow access only to specific domains as needed.
With this foundation, Cloudflare's MCP allows read-only access to all 2,000+ endpoints of the Cloudflare API while providing a safe environment for agents to execute code.
5. The Future of Agents and APIs 🔮
Matt Carey shares his vision for how agents will access external tools and how the API ecosystem will evolve.
5.1. More Isolated Environments 🌐
The web will see far more isolated environments emerge, and infrastructure primitives for safely executing untrusted code will proliferate. Code is a highly compact plan, and having models generate and execute code provides much greater degrees of freedom than individual tool calls.
"I think there will be a huge number of isolated environments on the web. And there will be a lot of infrastructure primitives for running this kind of untrusted code on the web. Because code is actually a very compact plan."
Technologies like Pydantic Monty, Deno, and Cloudflare's WorkerD are leading this trend. What was once a security nightmare — running untrusted code — is now an era of building the primitives that make it possible safely, driven by LLMs that can write that code.
5.2. Changes to API Services 🛡️
Service providers need to prepare for agents calling their APIs via generated code. This means a surge in API requests, making robust rate limiting and protection mechanisms essential.
"Users will write code. Because the users are AIs, and AIs are very good at writing code. And that's how they're going to interact with your platform… Your API needs to have good rate limiting. Because I can run this in a loop across many sandboxes simultaneously and hammer your API. You need to be able to handle that."
5.3. Innovation on the Client Side 🚀
Even more interesting changes are expected on the client side:
- Programmatic Tool Calling: Just as shown in the WorkerD, Deno, and Pydantic sandbox examples, executing untrusted code on the client side will become commonplace.
- Saved Mini-Scripts: Code that agents generate can be saved as mini-scripts for reuse. This is especially useful for repetitive tasks like cron jobs or web scraping, and can evolve to the point where agents self-correct script errors and save the updated version.
- More Clients and Stateless Approaches: Building MCP clients will become far easier than before, so more clients will emerge. To handle a future where each person may have many agents, implementing stateless agent loops in a cloud-native way will become critical.
5.4. The Future of MCP SDKs ⚙️
Matt Carey shares his vision for MCP servers and SDKs. MCP will act like middleware for API development — a simple flag (MCP=true) in your preferred framework will expose your API as an agent tool.
"I think MCP is going to become like middleware. When you're building an MCP server… same with API services. It's going to be a flag you can toggle in your favorite framework."
The SDK will become lightweight enough that by the end of this year it will be small enough to integrate natively into every TypeScript-based full-stack framework. This means an API service provider running dozens of APIs as a single Next.js app could expose them all as agent tools simply by enabling MCP=true.
Closing ✨
Matt Carey wraps up by highlighting Cloudflare's recently published "Code Mode" blog post, emphasizing how they were able to expose an entire API to agents using only a thousand tokens. He argues that every company with a large API, and every accessibility provider, should adopt this approach to make their data easily accessible to users.
"If you have a big, massive API, you should do this. Every accessibility provider should do this. Because it's really, really good for people to be able to access your data."
