Harness Engineering: Leveraging Codex in an Agent-First World preview image

This article shares the experience of the OpenAI team, who developed and shipped a software product over the past five months with zero lines of hand-written code. The product has internal users and external alpha testers, with all code written by Codex agents. This approach reduced development time to 1/10th, and the engineer's role shifted from writing code to designing environments where agents can work reliably, clarifying intent, and building feedback loops.


1. A Journey Starting from an Empty Git Repository

In late August 2025, the team started from an empty Git repository. The first commit was an initial scaffold (repo structure, CI setup, formatting rules, package manager config, application framework) generated by Codex CLI using GPT-5. Even the AGENTS.md file instructing agents how to work in the repo was written by Codex. From the very beginning, no human-written code existed -- everything was shaped by agents.

Five months later, the repository had accumulated roughly one million lines of code including application logic, infrastructure, tooling, documentation, and internal developer utilities. During this period, just three engineers drove Codex to open and merge approximately 1,500 pull requests -- an average of 3.5 PRs per engineer per day. Throughout, humans contributed no code directly. This became the team's core philosophy:

"No hand-written code."


2. Redefining the Engineer's Role

Not writing code by hand meant a new kind of engineering work focused on systems, scaffolding, and leverage. Initial progress was slower than expected -- not because Codex lacked capability, but because the environment was too unclear. The engineering team's primary job became creating environments where agents could do useful work.

This meant working depth-first: breaking large goals into smaller components, instructing agents to build those components, then using them to solve more complex tasks. When something failed, the answer was almost never "try harder." The question was always:

"What capability is missing? And how can we make it understandable and executable for the agent?"

Humans interact with the system almost entirely through prompts. Engineers describe tasks, run agents, and agents open pull requests. Over time, nearly all review work was handled agent-to-agent.


3. Making the Application Readable

As code throughput increased, the bottleneck became human QA capacity. The team made app UIs, logs, and metrics directly readable by Codex. The app could boot per Git worktree so Codex could start one instance per change. Chrome DevTools Protocol was connected to the agent runtime, enabling Codex to reproduce bugs, verify fixes, and reason about UI behavior directly.

The same approach was applied to observability tooling. Agents could query logs with LogQL and metrics with PromQL. With this context, prompts like "ensure service start completes within 800ms" became actionable. Single Codex runs were often observed working on a task for 6+ hours -- even while humans slept.


4. Turning Repository Knowledge into System Records

The biggest challenge for effectively using agents on large, complex tasks was context management. The simplest lesson learned early was:

"Give Codex a map, not a 1,000-page instruction manual."

Too many instruction files overwhelm the task. Too many instructions become "no instructions." Instructions decay immediately. The team treated AGENTS.md as a table of contents, not an encyclopedia -- roughly 100 lines serving as a map pointing to deeper sources of truth in a structured docs/ directory.

Plans were treated as first-class artifacts. Small changes used lightweight ad-hoc plans, while complex tasks were recorded as execution plans with progress and decision logs checked into the repository. Active plans, completed plans, and known technical debt were all version-controlled, enabling agents to operate without depending on external context. This enabled progressive disclosure -- agents start with a small, stable entry point and learn where to look next without being overwhelmed upfront.

Related writing

Related writing