Asher Cohen
Back to posts

The Orchestration Tax

Why scaling parallel work with AI and teams increases coordination costs and turns attention into the real bottleneck

The concept of the orchestration tax resonates deeply with me because it puts words to something I've increasingly experienced in modern software development.

AI tools and agents have made it remarkably easy to generate work, start parallel streams of execution, and keep dozens of tasks moving simultaneously. On the surface, it feels like productivity has been multiplied. Yet the reality is that while work can be delegated, judgment cannot.

The Bottleneck Moved, It Didn't Disappear

No matter how many agents are running, there is still only one person responsible for understanding the system, validating decisions, resolving conflicts, and maintaining a coherent mental model of the product.

The bottleneck has not disappeared; it has simply moved.

Instead of spending most of our time writing code, we spend more of it reviewing, evaluating, coordinating, and integrating. The volume of output increases dramatically, but the capacity to absorb and verify that output remains largely unchanged.

What makes this especially relevant to the work I do is that software engineering has always been as much about decision-making as implementation. Architecture, trade-offs, debugging complex systems, and understanding how changes interact across a codebase are fundamentally cognitive activities. They do not parallelize well.

In fact, introducing more concurrent work often creates additional overhead because every task competes for the same scarce resource: attention.

This Isn't New: Teams Have Lived It for Years

This observation extends beyond AI and mirrors a problem that many software teams have struggled with for years during sprint planning and scope definition.

Teams often assume that breaking work into more tickets and assigning those tickets to more people will automatically accelerate delivery. In practice, the opposite frequently happens. As parallel work increases, so does the need for alignment, coordination, communication, integration, and review.

Individual tasks may progress independently, but the responsibility for ensuring that everything still forms a coherent product does not disappear.

Anyone who has worked on a large initiative has likely seen this firsthand. A sprint can appear healthy because every engineer is busy and every ticket is moving, yet the project itself stalls because critical decisions remain unresolved or because integration becomes the dominant source of effort.

The hidden work of synchronization grows alongside the visible work of implementation. More activity creates more interfaces between people, more dependencies, more assumptions, and more opportunities for misunderstandings. The result is often a team that feels highly productive while actually accumulating friction.

AI Agents Recreate the Same Dynamic

AI agents introduce a remarkably similar dynamic. Instead of coordinating people, we coordinate machine-generated work. The mechanics are different, but the underlying constraint remains the same.

Whether managing developers or agents, someone must maintain the architectural vision, resolve ambiguity, review outcomes, and ensure that the resulting system remains coherent. That responsibility is difficult to distribute and impossible to automate completely.

This helps explain a paradox many engineers are beginning to experience. We can feel more productive than ever while simultaneously feeling more mentally exhausted.

The exhaustion does not come from producing work; it comes from constantly switching contexts, reloading information into working memory, and trying to maintain confidence in systems that are changing faster than we can fully comprehend.

The danger is that eventually the review process becomes superficial. We start accepting outputs because we are tired of evaluating them, not because we truly understand them. That is where both technical debt and cognitive debt begin to accumulate.

Attention Is a System Constraint

The most valuable insight is that attention should be treated like any other constrained resource in a system.

Just as we would design around database bottlenecks, network limitations, or CPU constraints, we need to design around the limits of human judgment. More agents are not automatically better. The right number of agents is determined by how much work can be reviewed thoughtfully and responsibly.

Beyond that point, additional parallelism simply creates queues and increases the orchestration burden.

This same principle should influence how teams define scope and plan work. Effective planning is not about maximizing the number of concurrent tasks; it is about maximizing the amount of valuable work that can move through the system without overwhelming its coordination and decision-making capacity.

Sometimes reducing parallelization improves throughput because it lowers integration costs and preserves clarity. The goal is not to keep every contributor busy at all times. The goal is to deliver outcomes efficiently while maintaining quality and understanding.

Organizational Implications

This has important implications not only for individual contributors but also for organizations.

Companies should be careful not to mistake visible activity for actual throughput. A dashboard full of running agents can create the illusion of massive productivity while hiding the growing cost of coordination, review, and context management. Similarly, a sprint board full of active tickets can create the illusion of progress while concealing mounting integration costs and decision bottlenecks.

If organizations want sustainable gains from AI, they need to recognize orchestration as real work rather than invisible overhead. Time spent reviewing, validating, understanding, and maintaining a mental model of a system must be protected rather than compressed.

Likewise, workers need protection from the orchestration tax itself. As AI increases the volume of output that can be generated, there will be growing pressure to supervise more agents, review more changes, and manage more concurrent streams of work.

Without clear boundaries, this can turn into a continuous cognitive load that quietly degrades both quality and well-being. Protecting focused time, limiting unnecessary parallelism, batching reviews, and reserving attention for decisions that genuinely require human judgment are not productivity hacks; they are safeguards against overwhelming the one resource that cannot be scaled on demand.

Conclusion

Ultimately, the real challenge is not learning how to run more agents. It is learning how to preserve the quality of human judgment in a world where machine-generated output is abundant.

The engineers and organizations that succeed will be those that recognize attention as a critical production resource and design their systems, workflows, and expectations accordingly.

#ai #softwareengineering #productivity #architecture #leadership