Overview
This article provides a detailed, chronological explanation of how the author combines multiple AI agents (especially Claude Code, o3, and Sonnet 4) as of July 2025 to build a personal AI factory, automating code production, verification, and even self-improvement. The author runs multiple Claude Code windows in separate git-worktrees, combining various AI models to run loops of planning, execution, verification, and feedback. The key point is that the factory itself is designed to improve over time.
"The factory improves itself."
Core Principle -- Fix the Inputs, Not the Outputs
When problems arise in code generation, the author doesn't manually fix the output or argue with AI. Instead, he adjusts the inputs -- plans, prompts, and agent configurations -- so the next run works correctly from the start.
"When something goes wrong, I don't hand-fix the generated code. I don't argue with Claude either. Instead, I adjust the plan, the prompt, and the agent configuration so the next run works correctly from the start."
This principle was inspired by the game Factorio. Like Factorio, the goal is to build an AI factory where AI agents create code, verify it, and continuously improve.
Daily Workflow -- Building the Factory
The author's primary interface is Claude Code. He also uses a local MCP running Goose and o3. Goose is connected to an Azure OpenAI subscription for convenience.
1. Planning
- Give Claude Code a high-level task, and Claude Code calls o3 to generate a plan.
- o3 asks several clarifying questions and organizes the original request and implementation plan in a <task>-plan.md file.
"o3 is a good planner, so it asks several questions to clarify what needs to be done."
2. Execution
- Sonnet 4 reads and validates the plan, then converts it into a task list.
- Claude Code executes the plan with Sonnet 3.7 or Sonnet 4 (mainly Sonnet 4, as there's a lot of Clojure work).
- Crucially, Claude is instructed to commit at each task step. This way, if something goes wrong, you can easily revert to a previous state.
"I have Claude commit at each task step. That way, if something goes wrong, I can always go back to a previous state."
3. Verification -> Feedback to Inputs
- Once code is generated, Sonnet 4 verifies it against the plan.
- Then o3 verifies the code against both the plan and the original request.
- o3 is very strict, flagging unnecessary code and things like "lint ignore flags."
- When both models verify together, they catch problems better and reduce unnecessary back-and-forth with Claude.
"o3 doesn't compromise. Claude tends to leave in unnecessary backward compatibility code to please, but o3 flags it and tells you to remove it."
- All discovered issues are reflected in the "plan template", not fixed directly in the code.
"All issues discovered by Sonnet 4 or o3 are reflected in the plan template, not fixed directly in the code."
- Using multiple git worktrees, multiple Claude Code instances can run simultaneously for parallel feature development.
Why Inputs Matter More Than Outputs
- Outputs (code) can be discarded at any time, but plans and prompts accumulate and evolve.
- Fixing problems at the root cause applies to all future work.
- Agents become not just code generators but self-improving colleagues.
"Outputs can be discarded, but plans and prompts accumulate. Fix at the root, and it applies to all future work."
For example, when one agent wrote code that loaded an entire CSV into memory, the author switched it to streaming and added the rule "always process CSVs with streaming" to the plan. Now the plan checker automatically flags this rule.
"Now my plan checker automatically flags code that doesn't process CSVs with streaming. I don't need to remember it in every PR review. The factory improves itself."
Scaling the Factory
The author is building increasingly complex workflows.
- Dedicated agents for specific tasks (placed behind MCP), such as an agent that scans all Clojure code to apply style rules.
- Internal library code is also automatically improved. For example, hand-written retry code or
Thread/sleepcalls are automatically replaced with the internal standard library. - Small, specialized agents are combined into complex workflows. For example, given API docs and internal business cases as input, multiple agents collaborate to handle API integration, testing, and documentation automatically.
"You can combine small agents into more complex workflows. You don't need to do everything yourself."
The key to making this work is not trying to get it right on the first attempt, but iteratively improving the inputs.
- Running costs are minimal even with multiple attempts, so run multiple agents simultaneously.
- Learn from failed or context-poor attempts and reflect those lessons in the next input.
- Resist the temptation to fix the output; fix the input instead.
"That iteration is the factory itself. Code is disposable, but instructions and agents are the real assets."
Next Steps
The author is preparing the following to further develop the factory:
- Better automation and coordination between agents
- Currently started manually, but the goal is to automatically manage workflows and dependencies.
- Aligning business documents with agents
- Organizing information at higher levels of abstraction so agents can use it more effectively.
- More complex workflows
- Tackling more advanced automation with more agents and more complex interactions.
- Maximizing token usage
- Overcoming constraints across multiple AI providers (like Sonnet 4's token limits) and finding ways to switch seamlessly without interruption.
Conclusion
Currently, the author's AI factory is good enough to deploy code while filling a cup of coffee, but not yet fully automated. However, the core principle remains unchanged.
"Constraints will change, but the core principle remains: fix the inputs, not the outputs."
Key Terms:
- AI Factory
- Input-centric improvement (plans, prompts)
- Agent combination and automation
- Self-improvement loop
- Parallel development
- Continuous feedback
- Outputs (code) are disposable; inputs are assets
In this way, the author builds a personal factory capable of self-improvement by combining AI agents, and through the principle of "fixing inputs, not outputs," creates an increasingly efficient and powerful development environment.
