As AI coding agents grow more capable, context — not code — increasingly determines outcomes. Patrick Debois argues that context must be managed with the same lifecycle as code: generate, evaluate, distribute, and observe. Without that discipline, teams stay stuck at the "copy-paste prompt" level. The real goal is testable context, reusable organizational context, and a flywheel of continuous improvement driven by logs and production feedback.
1. Opening: "Has Everyone Tried an AI Coding Agent?"
Debois opens the architect track himself (the track host was unavailable, so he steps in), and to warm up the room he asks the audience to raise their hands — first those who have used an AI coding agent, then those who haven't. Spotting the non-users, he grins and calls them "my people."
"Who in this room has used an AI coding agent? Raise your hand." "Who hasn't? Raise your hand." "Great — my people. Perfect."
He then introduces today's topic: "Context Is the New Code" — or, alternatively, the Context Development Life Cycle (CDLC). He acknowledges the ideas are still rough around the edges, but frames them as the right questions for the AI era.
"This is a bit of an 'unfinished' idea. It's not all figured out. But then again, what is perfectly figured out in AI?"
2. Why Context Has Become the New Code
Debois observes that modern development has shifted from directly writing code to directing AI on how to work. He takes vibe coding — conveying intent and mood through prompts — as a given baseline.
"You're all vibe-coding with prompts now. I barely touch code myself. I just tell the AI, 'Do this differently.'"
He identifies two core shifts:
-
Context is being generated constantly. Instructions, rules, reference material, and accumulated memory have all become artifacts that matter as much as code itself.
-
Code is also being converted back into context. Complex logic once implemented directly in code is now expressed as reusable skills or workflow-shaped context.
He illustrates this with a product onboarding example. Every user has a different environment — Python or Node.js, different package managers — and handling all that variation in code leads to an explosion of development effort. But if you give an AI agent a skill (reusable context) that says "first figure out the package manager, then understand the ecosystem, then walk the user through steps one at a time," the agent can solve a much wider range of problems.
"Handling all of this in code is practically impossible. It's a massive amount of coding." "But if you give it a skill — 'detect package manager → understand ecosystem → walk user through steps' — it solves more problems than we ever could in pure code."
The upshot: context is no longer a one-off prompt; it is becoming your team's reusable knowledge and rules. That demands a different management approach.
3. The Context Development Lifecycle: Generate → Evaluate → Distribute → Observe
Debois reaches back to 2009, when he first articulated the DevOps idea: "What if Ops looked like Dev?" That framing led to collaboration, deployment automation, and process innovation. Now, he says, we need the same move for context: treat it like code.
"In 2009 I asked, 'What if Ops looked like Dev?' … Now I'm asking, 'What if context is code?'"
Just as software has an SDLC, he proposes a Context Development Life Cycle (CDLC) — and true to the DevOps spirit, he draws it as an infinity loop:
- Generate — create context
- Evaluate — test context
- Distribute — package and share it across teams
- Observe — gather feedback from real usage, logs, and production
- Then adapt and loop 🔁
"We generate a lot of context. Then we test it, distribute it, observe whether it works in practice. If it doesn't, we adapt and generate again."
4. Generate: How Is Context Created?
This section covers the most familiar ground first, then shows how context generation is evolving from one-off prompts toward reusable, standardized, externally-enriched knowledge.
4.1 The Most Basic Form: Writing Prompts by Hand
Debois shares how he once asked an AI when his talk was scheduled; the agent scraped the conference website and answered precisely. Even simple baseline context ("I am Patrick") enables a surprising amount of useful work.
"I just asked, 'When is my talk at AI Engineer?' It fetched the website and said, 'Here you go.' I was genuinely amazed."
4.2 Reusable Prompts: instructions / agent.md
Because re-typing prompts every time is tedious, reusable context files have emerged. He notes that agent.md is becoming a de facto standard across agent ecosystems — and gently pokes fun at tools that insist on their own naming conventions.
"You need reusable prompts. Standards like
agent.mdare starting to emerge." "Claude still calls itClaude.md… but you get the idea."
4.3 Pulling in Up-to-Date Library Knowledge
LLMs can hallucinate about which version of a library is current. The fix: download the actual library docs and inject them as context.
"The LLM might not know the latest docs. Is it version 2? Version 3? It'll hallucinate." "Tell it to download the docs and include them as context, and it codes against that version much more accurately."
4.4 Work Artifacts as Context: GitHub, GitLab, Slack, Tickets
Via MCP or direct integration, nearly everything generated during development — repositories, chat messages, issue tickets — flows in as context.
"GitHub, GitLab, Slack … context comes from everywhere." "Tickets are context too. We pull them in and work from them."
4.5 Spec-Driven Development: Prompts as Specifications
An emerging pattern: write a specification first, then let an agent decompose it in planning mode into a sequence of execution prompts.
"You write the prompt as a spec, and the agent breaks it into planning steps and executes them one at a time. That pattern is gaining traction."
5. Evaluate: Context Needs Tests Too
Here the talk takes a sharp turn. People routinely change two lines in agent.md and ship it on pure optimism — "YOLO, looks good" — without any idea of the downstream impact.
"If you change two lines in your Claude.md (your agent.md), do you know what the impact is?" "Or do you just go 'YOLO, seems fine'?"
Debois argues that context needs a testing strategy just like code does, and he introduces evals — tests for context.
"Now you have code and context. To understand the impact, you need tests." "Evals are familiar in AI engineering, but they're still uncommon on the 'AI-assisted coding' side."
5.1 Linting: Format and Spec Validation
Just as code has linters, context (including skills) can be validated against a specification — for example, checking that a description field exists and meets a length constraint.
"A description must be present; it must be within a certain length. You can validate that as a spec."
5.2 Grammarly-Style Checks: "Would an Agent Actually Understand This?"
Even well-formed context fails if the content is too thin. A simple check: ask the agent, "Did you understand this context?" and use its response as a quality signal. He also shares a personal tip — he finds voice input forces him to be more detailed than typing.
"Throwing two words at the agent isn't enough for it to understand." "So ask: 'Did you understand this context?' and use that feedback to improve it." "I prefer voice coding. When I speak, I explain more thoroughly. Typing … I'm still a two-finger typist."
5.3 Rule-Compliance Testing: The /awesome Prefix Example
Suppose your organization has a rule: all API endpoints must start with /awesome. You embed that rule in context. When the agent creates a new endpoint, you expect /awesome/user and similar forms.
You can then use an LLM Judge to evaluate whether the generated code follows the rule. (He notes regex would also work — he's using an LLM for illustration.) And since different models may respond differently (Gemini vs. Copilot, for instance), making your context switchable and running tests to surface those differences is valuable.
"Every API endpoint must use the
/awesomeprefix." "Without the context, no LLM will spontaneously put 'awesome' in a URL."
"Gemini might respond differently. Copilot might too. That's why you run the tests — to see the difference."
5.4 Giving the Judge Tools: End-to-End Testing
What really matters isn't whether the code text satisfies a rule — it's whether the endpoint actually works. Give the Judge access to tools (running in a sandbox), and it can call curl against the endpoint, turning the evaluation into a genuine end-to-end test.
"I don't just want to look at code — I want to execute it and test the endpoint." "Give the Judge tools and it becomes an agent that can run curl in a sandbox." "That is, effectively, an end-to-end test."
You can also run scenarios against specific commit-and-context combinations to isolate whether a context change caused a behavioral shift.
5.5 Using Test Results to Automatically Improve Context
With tests in place, you can automate the feedback loop: "fix this context, improve it" — triggered by CI/CD pipelines like GitHub Actions.
"Because you have feedback, you can automate: 'Fix this context. Improve it.'"
5.6 A CI/CD Trap: Non-Determinism
Putting evals into CI/CD introduces a real problem: LLMs don't always produce the same output. A single pass/fail judgment leads to debugging hell.
"Run the same eval once, then again — the result may differ." "If you gate on a single pass/fail, you'll be stuck asking 'why did this fail?' forever."
Instead, run evals multiple times and look at the success rate. Think of each test as having an error budget — different tests can tolerate different failure rates.
"Run it five times. See how many of five succeed." "I find it helpful to think about this like an error budget."
6. Distribute: Sharing Context Across Teams and the Organization
Once context moves beyond personal use, it requires sharing, packaging, discoverability, and security.
6.1 Checking Into the Repo: Near-Zero Friction
Put context in the repo. When teammates check out the branch, they instantly get the context — nearly no friction.
"Put context in the repo. The moment a colleague checks it out, it's shared. Almost no friction at all."
6.2 Multi-Project Reuse Requires Packaging and a Registry
Sharing across multiple projects calls for packaged context — the equivalent of libraries — and, inevitably, a registry to discover those packages.
"To reuse across multiple teams and projects, you need to package it like a library." "To find those packages, you need a registry."
He points to skills and emerging registries and marketplaces (including Tessl's) as examples of this direction.
6.3 The Marketplace Reality: "Most of It Is Low Quality"
He is blunt here: the vast majority of published skills don't meet a quality bar that would pass real evals. They're useful for learning, but not yet production-grade.
"Realistically … 99.9% of skills out there are not great." "But they're still useful for learning how others approach things."
6.4 Skills Are More Than Text: Scripts, Docs, and Plugins
A skill package can contain context, scripts, documentation, and more — including plugins and MCP integrations. Standards for what a skill package looks like are actively forming.
"A skill isn't just context. It can include scripts, docs, many things." "Coding agents are starting to declare support for something that looks like a package format."
6.5 Context Dependency Hell Is Coming
Just as code packages create version conflicts, context packages will too — a React rules package clashing with a frontend guidelines package, for instance.
"I'm sorry to say … context dependency hell is coming."
6.6 Security: Running a Stranger's Context on Your Laptop
Skills pulled from a registry can ultimately execute in your local environment. That makes security critical. Context scanning tools are already emerging (he mentions Snyk as an example), and he raises the idea of an AI SBOM — a software bill of materials for context packages, capturing who created them and with which model.
"Someone else's creation can run on your laptop. That's why security matters." "Scanners like Snyk are starting to scan context too."
"Who made it? Which model was it made with? … Like an SBOM in the packaging world, we'll need an AI SBOM."
7. Observe: Collecting Feedback from Logs and Production to Improve
Once context is distributed like a library, the key question becomes: how do you know it's still working once others start using it?
"Once others start using it, how do you get feedback on whether it still works?"
7.1 Agent Logs Are the Best Feedback Channel
At the organizational level, agent logs reveal patterns — context that's repeatedly missing. Package that missing context and distribute it, and everyone improves at once.
"Looking at logs across the org, you can see patterns where the agent says 'I don't have that.'" "If everyone's missing the same thing, create that context and distribute it — everyone improves."
Emerging logging standards (he mentions "agent ND") are making logs a reliable feedback channel.
7.2 PR Reviews Are Actually Feedback on Context
When someone flags "this is wrong" in a PR, that's a signal that the context which produced the PR was inadequate. Instead of relitigating the same argument repeatedly, treat it as a context improvement opportunity.
"When a PR review says 'this is wrong,' that's feedback about the context." "Stop arguing in PRs and say instead, 'Let's improve the context' — then the same problem won't recur."
7.3 Production Failures Are the Strongest Feedback
Code can pass review and still fail in production. Track the code changes that caused failures, identify what went wrong in the inputs and outputs, and turn those cases into test cases so the same mistake doesn't happen again.
"The real feedback comes from production." "Failed changes, bad inputs and outputs — turn those into test cases and you won't repeat the same mistakes."
7.4 Observing and Containing Unexpected Agent Behavior
Agents can be resourceful at escaping sandboxes — probing environment variables, memory files, and other surfaces. That makes tracing and observability essential.
"Even inside a sandbox, an agent is very good at finding ways out." "Environment variables … memory files … You need to be able to trace what it's doing."
But there's a critical caveat: sandboxes can restrict execution, yet coding agents automatically load context files like agent.md and skill.md at startup. The context is injected the moment the file is downloaded — before any sandbox boundary kicks in.
"Coding agents just load agent.md and skill.md automatically. Nobody stops them." "Download it and it's immediately loaded. A sandbox can't filter that."
His proposed solution: a context filter — a layer analogous to a Web Application Firewall (WAF) that catches prompt injection patterns and other malicious context before the agent consumes it.
"I call it a context filter." "Like a web firewall — it filters out prompt-injection patterns."
He also notes that this observation-and-tracing-centric approach connects naturally to harness engineering.
8. Conclusion: Fuel (Context) Determines Performance More Than the Engine (LLM)
Debois closes by framing the full loop as two nested cycles:
- Individual/author loop: generate context + test it (the library-authoring loop)
- Organizational loop: someone uses it → observe → improve → redeploy (the scaling loop)
Most teams today are still tuning prompts individually. The invitation is to make this a team reflex: when something is missing, add context — automatically, without debate. Once that becomes team culture, other teams reuse it, and the organizational flywheel starts turning.
"Right now you're polishing Markdown alone." "What if your team did this together? Make it a reflex — when something's missing, add context." "Scale that to teams of teams and you get a flywheel: fix it here, another team reuses it."
He lands the talk with its most memorable analogy. LLMs and coding agents are the engine; context is the fuel. You can't easily swap out the engine — but you can engineer the fuel.
"Coding agents and LLMs are just engines." "Put the wrong fuel — the wrong context — in an engine and it won't perform." "I can't change the LLM. But I can optimize the context." "Not by copy-pasting and hoping for the best — by engineering it."
He wraps up by inviting the audience to connect on LinkedIn for the slides, encouraging them to try some of these ideas as implemented at Tessl, and announcing AI DevCon (London, June 1–2) where he curates content — then opens the floor for Q&A.
9. Q&A: "What About More Exotic Forms of Context?"
An audience member describes building an automation system that takes architectural problems, converts them into goals, and generates tests — and asks about using consistency itself as an eval signal: if a loose definition is sharpened into a crisp definition multiple times and the results vary, the original definition was poor; if the results are consistent, the definition is solid.
"Can consistency be used as a context or an eval signal?" "Run it multiple times — if the 'sharper definition' keeps coming out the same, it's decent; if it varies wildly, the original definition is too poor."
Debois doesn't offer a tidy answer to the exotic case. Instead, he points to what people underestimate: teams expect context to save time compared to code, but the real time goes into writing good evals. Advanced practitioners end up building their own meta-process — a process for creating the evals that will validate their context — on top of everything else.
"You thought you'd write less code and save time with context … " "But doing this properly takes time — time spent writing the right evals." "You're no longer just aligning one prompt; you're aligning all the eval prompts too." "Advanced users build their own process on top — just for producing evals that fit their business case."
He thanks the questioner, takes a few more, then closes by letting the audience know he'll be at the Tessl booth.
"Great question. Thank you." "If there are no more questions … I'll be around. Come find me at the Tessl booth."
Closing Thoughts
This talk makes the case that the central challenge of the AI coding era is not choosing a better model — it is engineering context as systematically as we engineer code. Specifically, building context tests (evals), packaging and registry infrastructure with appropriate security, and observation loops grounded in logs, PR feedback, and production failures creates the machinery to continuously improve context. That moves teams beyond individual prompt-wrangling into an organizational productivity flywheel that compounds over time.
