Probabilistic Engineering and 24/7 Work

This essay shows how software development is moving away from deterministic code and increasingly becoming a probabilistic system. The author uses Compound Loop, a system he built himself, to explain the arrival of an era in which AI agents can write and review code autonomously. This change is transforming developers' roles, organizational structures, and even the meaning of "launching" a product, and it is spreading especially quickly among AI-native companies. The conclusion is that the future of software development requires building organizations and cultures for more advanced AI models that have not yet arrived, not merely for the models available today.

1. Numbness at the Beginning: A New Era of Software Development

Software is quietly evolving into probabilistic systems. In the past, we could write code, test it, deploy it, and feel confident that it would work. Now that confidence is breaking. In leading AI-based companies especially, codebases are beginning to turn into systems people believe will "probably work." It is no longer possible to state the exact probability of success.

This change is altering not only the way we work, but also roles, organizations, training programs, and the very nature of "product launches." Tim Davis says he came to understand this change through direct experience.

"I noticed because I built it."

2. The Compound Loop Experience: The Birth of the 24-Hour Employee

A few months ago, the author used his time after work to build a side project called Compound Loop. The system used several frontier AI models to write, review, and merge code almost autonomously. Before going to sleep, he gave the system real problems; when he woke up in the morning, he found many pull requests that had been generated overnight.

"Some pull requests were great, some were wrong, and some surfaced questions I should have asked."

At 8 a.m., instead of handling the previous day's backlog, the author decided which pieces of overnight work to keep. Meanwhile, the system continued analyzing logs and adding new pull requests. This continuous compounding quality surprised him.

For the first time in the history of knowledge work, it created a situation in which a person left work without fully taking their brain with them. The old idea of "9-9-6," meaning 9 a.m. to 9 p.m., six days a week, disappears. Instead, we become a kind of 24-hour employee. But a 24-hour employee does not mean a human works all day and night. It means many agents work with enormous parallelism. Even in 2026, most teams are bottlenecked more by coordination than by typing speed, but the future always appears first at the frontier, and in the author's view it is already here. This essay is less a description of the entire industry than a view into what is already happening inside the most AI-native teams, and a prediction about how that will affect the rest of the industry.

3. Role Changes: The Two Sides of Promotion and Fragmentation

Inside the most AI-native teams, role change is more complicated than the common idea that "everyone moves up a level." Some developers really are moving to higher levels. The best engineers are becoming more effective product managers, the best product managers are becoming systems architects, and the best architects are thinking about market shifts. They are having the best years of their careers, and their work has far more impact than before.

But that is not the whole story. Behind the optimistic version, there is downward pressure, and it is fragmenting roles. Many engineers are not becoming architects. They are becoming spec writers, reviewers, and agent managers. They spend time translating intent into prompts that machines can understand, then evaluating outputs against standards they may not fully understand themselves. Some of this work is truly important; some of it, the author argues, is little different from data entry in 2026.

The author says we need to be honest about what this change means for people.

"These fragmented roles will be lower-paid, less valued, and often career dead ends. They will become a layer of output-cleanup work that the system needs but does not reward."

The wage gap between the top 30 percent who can run agent systems effectively and the middle layer that manages their outputs will become much larger than the old gap between engineers and salespeople. The author says this gap is already widening in companies he watches closely, and he does not believe it will close by itself.

In AI infrastructure, important defensive barriers still remain in areas such as kernel performance, compiler design, and hardware abstraction. That is because highly deterministic properties are required at the lowest levels of systems engineering. But at the level of building software on top of those barriers, the center of gravity is moving strongly toward human inputs that machines cannot yet replicate. This shift is real, and it is accelerating.

4. Jevons Paradox and the Future of Code

In 1865, the economist William Stanley Jevons observed that more efficient steam engines did not reduce coal consumption; they increased it. As efficiency improved, more things became worth powering with engines. We are now experiencing the software version of the same phenomenon, and it is one of the most interesting moments in software history. As the unit cost of writing code approaches zero, we are not writing less code. We are writing far more code and deploying far more of it. The best teams are actively adapting to this change.

"Companies that believe scaling laws are infinite are building accordingly, and they will be winners."

Many of the author's friends at leading AI-native companies are already adapting rapidly. Agents open pull requests, review one another's work, and close them without a human touching the keyboard. When problems appear, real-time monitoring loops fix them quickly. Self-healing test suites rewrite themselves as the underlying code changes. Autonomous experiment loops formulate, measure, and discard hundreds of hypotheses in the time a team used to test three. Documentation updates faster as AI is merged into the process, and that AI technology itself keeps improving. We are moving from an era in which feature implementation was limited by engineers' typing speed to one limited by human creativity, management of agent systems, and the rate at which the product can absorb output.

The author believes this is a truly good time to build. The increase in output is not subtle. Teams reorganized around agents are shipping three times, five times, even ten times more than they did a year ago, and the increase is accelerating. Many founders and operators he speaks with are not complaining about noise; they are thinking about how to feed more work into tomorrow's agent fleet. Even a small increase in the output of well-directed agents compounds into an advantage over competitors that still depend on typing.

But Jevons's second lesson applies here too. It is the factor that separates teams that benefit from this change from teams that do not. When supply explodes, choice becomes everything. Abundant coal made engines more valuable, but the discipline of choosing what to burn, what to power, and what to build with the output became much more important. Cheap energy without judgment is just waste. The same applies to code.

For teams that respond well to this change, choice is not an overwhelming problem; it is a new point of leverage. Operators who can direct agent fleets toward the right problems, select what is actually valuable from the outputs, and integrate those outputs into something coherent are now doing some of the most leveraged work in software. The value of work is no longer determined by how much effort it took to produce. The effort has collapsed. Value now depends on who directs the agent fleet well, who chooses what to keep from the returned work, and who integrates it into something that compounds faster. Production is no longer the hard part. The hard parts are direction, selection, and coherence, and the best teams are trying to build those abilities as quickly as possible.

5. From Deterministic Engineering to Probabilistic Engineering

We are rapidly moving from deterministic engineering to probabilistic engineering. But our tools, education, and organizational intuitions are still built for the old paradigm. Deterministic engineering was the contract that governed most of the history of our profession. You wrote code, tested it, reviewed it, and within known boundaries you could know what it did. Errors were deterministic as well. The same input produced the same output, and bugs were reproducible enough to be found.

Probabilistic engineering is different. Frontier teams are already seeing this. Significant portions of their codebases are generated by probabilistic systems, reviewed under time pressure inside a context too large to understand fully, and integrated into entire systems that no single human designed end to end. The codebase still runs and deploys, but the confidence interval around "it works as intended" has widened. Most teams, however, have not updated their practices to reflect that. This is where the core asymmetry appears: generation has become cheap, but verification has not.

An agent can generate a plausible 500-line pull request in less than a minute. But finding a subtle bug in that pull request, such as a concurrency issue, a nuanced misunderstanding of the specification, or code that literally does what was requested but not what was actually wanted, may require a senior engineer to read carefully for more than an hour. Review scales worse than generation, and crucially, it does not scale linearly with output volume. As more of the codebase is written by agents, the context a reviewer must hold in mind to evaluate a single piece grows larger. You are no longer reviewing one pull request against a codebase you wrote yourself. You are reviewing it against a codebase mostly written by other agents, parts of which you may have forgotten you once reviewed deeply, under ever-increasing time pressure.

"At some scale the system produces more than humans can reliably evaluate, and correctness becomes probabilistic rather than certain."

This is not a future problem; it is a present problem. Beyond a certain throughput, bugs appear not because reviewers are careless, but because the volume of output exceeds what human attention can meaningfully inspect. And the models that perform a significant share of review are themselves nondeterministic, so they miss many things. A codebase becomes not something you know works, but something you believe will work, and you can no longer state that probability precisely.

Concretely, this produces situations such as a race condition that passes the test suite nine times out of ten, a feature that works perfectly in staging but fails under an unexpected prompt distribution, or a migration that quietly corrupts one row in ten thousand and is discovered only three weeks later. A recent joint study by Proximal and Modular on testing frontier coding agents documented these failure patterns (https://www.modular.com/blog/how-frontier-coding-agents-built-a-video-diffusion-pipeline-on-max). The author says he has seen the same phenomenon in code written by his own multi-agent system. These failure modes usually look less like dramatic collapse and more like slow, quiet degradation. Generation increases, review quality falls, hidden defects accumulate, and trust in the system quietly erodes until a customer, an auditor, or a production incident brings the problem to the surface. By then, the technical debt is already deep.

The uncomfortable truth is that we do not yet have the right tools for this problem. Culture can help: smaller merges, stricter gates, ruthless skepticism toward polished outputs, visibility, and rollback discipline all matter. But culture does not scale beyond a certain team size, and our current systems for evaluating probabilistic code are primitive compared with what we need. The author hopes someone builds the right tools for this problem, and says whoever does will define the operating system of serious software development for the next decade. The new CI/CD is not yet a tool. For now, it is a culture of ruthless skepticism and an honest admission that we are building its replacement in real time.

6. Different Speeds Across Industries

The transition from deterministic engineering to probabilistic engineering will not happen uniformly. Technology diffusion takes time, legal and regulatory frameworks always lag behind technical progress, and this shift will be stratified by industry and risk profile. Anyone deciding how to build must understand that stratification.

The deterministic layer includes highly regulated, high-risk areas such as avionics, medical devices, financial trading infrastructure, nuclear control systems, and the core of payment networks. These areas will remain deeply deterministic for a long time, and they should. In a fly-by-wire system, the cost of a quiet correctness error is not a customer complaint; it is life. These areas will adopt agent assistance carefully, behind formal verification, extensive simulation, and deliberately slow human approval. This is not a failure of imagination. It is the correct reading of what the risk requires.

The probabilistic layer includes consumer software, internal tools, marketing systems, most SaaS, most content infrastructure, and most experimental or early-stage product work. Probabilistic engineering is already active here and will accelerate quickly. The cost of a bug is rollback, apology, and a hotfix. In exchange, teams in this layer get an iteration speed that the deterministic world cannot structurally match. A probabilistic team willing to ship, measure, and fix can learn an order of magnitude more per quarter than deterministic competitors.

The convergence zone is what the author calls the interesting middle of the future, and it is where competition over the next decade will happen. As models become smarter, the systems around them improve, and iteration cycles approach real time, the boundary of what is safe to handle probabilistically will keep moving. Areas that look deterministic today, such as parts of insurance, parts of healthcare, and parts of enterprise infrastructure, will find probabilistic methods seeping in from below. This will happen slowly, then suddenly. Meanwhile, the frontier of probabilistic engineering will begin rebuilding deterministic guardrails: formal checks, verified core paths, and hybrid systems where probabilistic generation is bounded by deterministic verification.

The winners of the next decade will know which layer they are in, resist the temptation to pretend they are in another layer, and decide very precisely where the boundary between the two should sit inside their own stack.

7. Agent Fleets: A New Way of Working

The author says he has thought a lot about the right metaphor for this shift, and he does not think "factory transition" is the right one. Factory workers were the system being automated. We are not. In his view, the best metaphor is an agent fleet. But he also says the term should be used carefully. "Fleet" implies order, hierarchy, and reliability, while reality does not yet deserve those associations. What most operators actually run is closer to a fragile mass of contractors than a well-trained navy. Agents have uneven capabilities, behave probabilistically, are sometimes confidently wrong, and are often expensive to run at scale. Orchestration layers fail, context windows explode, and inference costs show up as bills you would rather not show the board.

Even with those caveats stated honestly, the author still thinks the agent-fleet concept is useful. A fleet has composition: different agents for different tasks. It has coordination: handoffs, dependencies, and escalation paths. It also has a command structure: someone decides the mission, sets the rules of engagement, and reviews the returned results. And, crucially, a fleet has a watch rotation. It does not stop when the commander sleeps. It continues operating within the given orders and reports what it found in the morning.

A good fleet is not defined by how much it produces. It is defined by how well its output is maintained. From this perspective, our workday takes on a new shape. In the morning, we triage and merge. In the middle of the day, we do high-impact human work: customer conversations, strategy, product decisions, and writing specifications for overnight work. In the afternoon, when the first agents begin reporting back, we review and reassign. Then, at the end of the day, we do something earlier generations of knowledge workers never did: we hand work off. We pass specifications for tasks we want tried overnight to the agent fleet, dispatch the agents, and accept that some returned results will be wrong and some will be excellent, and that only we can tell the difference. Then the workday is done. The agents do not sleep. That is the point. If review discipline holds, we wake up ahead of where we stopped the previous day.

8. Build for Models That Do Not Exist Yet

One point the author has repeated for the past few years, and one he believes many leaders at large companies still miss, is this: the model we use today is the least intelligent model we will ever use.

He says this carefully, because capability growth may not always be smooth. Cost, latency, reliability, and scaling limits can all matter. But given what he sees in the infrastructure layer, the direction is clear. Within the next six to twelve months, frontier capabilities will far exceed today's level, and the gap between the best model available now and the best model available then may be larger than the gap between today and a year ago. That earlier gap was already substantial. Scaling laws continue to matter.

This has a strategic implication that most leadership teams have not fully understood. You are not building organizational capacity for the models you have today. You are building capacity for models you do not yet have. The specification-writing habits you are learning, the review culture you are building, the observability you are connecting, the agent fleets you are learning to command, and the training rituals you are experimenting with to preserve junior skills are not scaffolding for 2026 capabilities. They are scaffolding for 2027 and 2028. Companies that build these scaffolds now, before the next capability jump arrives, will be able to use that jump as leverage. Companies that wait for the tools to mature before reorganizing will spend the first year of the next capability era learning what the leaders already know, while the leaders compound.

This is what separates organizations that stay relevant from those that do not. It requires building systems for the models you will have, not the models you have now, and investing aggressively in specification, review, and operational discipline beyond what current models seem to require. The reason is simple: today's model is the weakest model you will use. Teams that understand this early will move ahead. Teams that do not will discover eighteen months later that competitors who did not look obviously better a year earlier have quietly passed them. Irrelevance in this era does not arrive suddenly. It arrives as a gradual inability to keep pace with teams that, a year earlier, did not appear noticeably better than you.

9. The Muscles We Will Lose

As the author mentioned in a previous essay, AI will either stratify society clearly or democratize it dramatically. Humans are beautifully and relentlessly efficient at optimizing for the path of least resistance. Whenever possible, we choose the option that minimizes the effort required, whether that effort is physical, cognitive, or emotional. In the context of this essay, that leads to a simple idea: if you never build directly, you will lose the ability to evaluate what is being built.

This is not hypothetical; it is already happening among junior engineers who begin their work with AI. They ship quickly, produce clean code, and can explain what the code does in general terms. But when a model fails in an unexpected way, they often cannot find the bug, because they have never personally developed an internal model of the system through countless 2 a.m. struggles with stack traces.

"Taste is not learned by clicking approve on a polished draft. Judgment is not built by accepting a plausible machine answer in five seconds; it is built by spending an afternoon wrestling with a hard problem. Craft is not learned by reviewing another agent's work."

Those skills are formed through exactly the friction that agents so conveniently remove.

This creates a training crisis that most organizations have not yet addressed. The apprenticeship model of software engineering, where juniors build small things, seniors review them, and juniors absorb taste through senior feedback, breaks when juniors ship through agents and seniors review agent output rather than human output. Where will the next generation of craft come from? How do we train taste without repetitive practice? What replaces mentoring on work the mentee did not write in the first place? This is an uncomfortable extension of the argument. In most traditional organizations the author speaks with, the current generation of senior engineers may be the last group fully trained in the old way.

Everyone who follows them is learning in an environment where machines that did not exist a few years ago do much of the hard work. That does not necessarily mean they will be worse, but it does mean they will be different. When forcing old-style hard training is no longer commercially rational, the burden falls on all of us to figure out what the new hard mode should be. Teams that treat agents purely as accelerators without redesigning how they develop people may discover, five or ten years from now, a generation of operators who can command a fleet but do not understand the blueprint of the ship. For readers who want to preserve their own skills, the balanced response is a contrarian one: sometimes do the work without the fleet. Not always, and not most of the time, but intentionally and regularly, on important work, in the hard way. If you keep muscles that most of your peers do not, it may make a tremendous difference ten years from now.

10. The Uncomfortable Truth: The Future May Be Messy

This essay intentionally does not end with an optimistic wrap-up. As with every major change, refusing to look at it does not keep it from arriving. Work has already changed forever, and it is evolving and accelerating with AI. This will let us reclaim daytime for work that truly requires humans, while machines reclaim the night for the work that was always grind.

The next few years will be messy. Predictable scenarios include a class of employees exhausted by the review burden they volunteered for, a layer of fragmented roles that systems need but do not reward, a generation of juniors who never develop the skills that today's seniors use to judge output, teams that confuse output volume with work quality until an incident reveals the gap, and a widening divide between organizations that build operating capacity for the next model and organizations that do not. All of these are possible, and some are already happening.

At minimum, the core lesson is this: build the organization for the model that has not yet arrived, so it does not surprise you when it gets here. Sometimes build hard things yourself so you do not forget how. Send the agent fleet out at night and sleep comfortably knowing work is underway, but remain awake to the possibility that some of what returns will be wrong in ways you are no longer trained to detect.

The 24-hour employee is not a promise. It is a reorganization, and an investment in the future of probabilistic engineering. Whether that investment succeeds depends on whether the humans in the system are sharp, honest, and well trained enough to be worth keeping in the system in the first place. It also depends on whether the organization around those humans was built not for today's models, but for models that have not yet been released. The investment can win, but it has not won yet.

Conclusion

The future of software development is becoming more probabilistic through collaboration with unpredictable AI agents. This change is fundamentally shaking up work practices, roles, and organizational structures, and AI-native companies are already at the front of it. Given the speed of future AI model development, it is crucial not to remain limited to today's tools and habits, but to build systems and capabilities for future models in advance. Otherwise, organizations risk falling behind competitors or losing core human abilities such as taste and judgment. Ultimately, the human role will concentrate on directing, selecting, and verifying AI-generated outputs, and new cultures and training methods for that role are urgently needed.

1. Numbness at the Beginning: A New Era of Software Development

"I noticed because I built it."

2. The Compound Loop Experience: The Birth of the 24-Hour Employee

"Some pull requests were great, some were wrong, and some surfaced questions I should have asked."

3. Role Changes: The Two Sides of Promotion and Fragmentation

The author says we need to be honest about what this change means for people.

"These fragmented roles will be lower-paid, less valued, and often career dead ends. They will become a layer of output-cleanup work that the system needs but does not reward."

4. Jevons Paradox and the Future of Code

"Companies that believe scaling laws are infinite are building accordingly, and they will be winners."

5. From Deterministic Engineering to Probabilistic Engineering

"At some scale the system produces more than humans can reliably evaluate, and correctness becomes probabilistic rather than certain."

6. Different Speeds Across Industries

7. Agent Fleets: A New Way of Working

8. Build for Models That Do Not Exist Yet

9. The Muscles We Will Lose

"Taste is not learned by clicking approve on a polished draft. Judgment is not built by accepting a plausible machine answer in five seconds; it is built by spending an afternoon wrestling with a hard problem. Craft is not learned by reviewing another agent's work."

Those skills are formed through exactly the friction that agents so conveniently remove.

1. Numbness at the Beginning: A New Era of Software Development

2. The Compound Loop Experience: The Birth of the 24-Hour Employee

3. Role Changes: The Two Sides of Promotion and Fragmentation

4. Jevons Paradox and the Future of Code

5. From Deterministic Engineering to Probabilistic Engineering

6. Different Speeds Across Industries

7. Agent Fleets: A New Way of Working

8. Build for Models That Do Not Exist Yet

9. The Muscles We Will Lose

10. The Uncomfortable Truth: The Future May Be Messy

Conclusion

Related writing

Inside YC's AI Playbook

Building OpenCode with Dax Raad

The AI Paradox at Work

Reading

1. Numbness at the Beginning: A New Era of Software Development

2. The Compound Loop Experience: The Birth of the 24-Hour Employee

3. Role Changes: The Two Sides of Promotion and Fragmentation

4. Jevons Paradox and the Future of Code

5. From Deterministic Engineering to Probabilistic Engineering

6. Different Speeds Across Industries

7. Agent Fleets: A New Way of Working

8. Build for Models That Do Not Exist Yet

9. The Muscles We Will Lose

10. The Uncomfortable Truth: The Future May Be Messy

Conclusion

Related writing

Inside YC's AI Playbook

Building OpenCode with Dax Raad

The AI Paradox at Work