AI & ML

Multi-Agent Orchestration Is a Concurrency Problem Wearing an AI Costume

Everyone draws the same coordinator-and-workers diagram and skips the part that actually breaks: shared state, race conditions, and trust boundaries. Here's what orchestration really costs -- grounded in a real system that fans job evaluations out to N parallel agents.

Dhileep KumarJun 12, 20267 min read

Multi-Agent Orchestration Is a Concurrency Problem Wearing an AI Costume

Every write-up on multi-agent orchestration shows you the same diagram: a coordinator box with arrows fanning out to a Researcher, a Coder, and a Validator. It's a tidy picture, and it hides the part that actually decides whether your system survives contact with production. The interesting problem in a multi-agent system is almost never how the agents talk. It's what happens when two of them try to write to the same file at the same time, or when one returns confident garbage and the next one believes it.

So instead of redrawing the puppeteer diagram, I want to argue a specific point of view: orchestration is a concurrency problem wearing an AI costume. The agents are the easy part. The coordination -- shared state, race conditions, error propagation, trust boundaries -- is the engineering. I'll ground this in a real system I've been living inside: career-ops, an open-source job-search pipeline that fans work out to N headless CLI agents running in parallel. The failure modes below are the ones that repo actually had to fix in code, not ones I imagined for a blog post.

The mental model: agents are cheap, shared state is expensive

Here's the reframe. When you split one agent into five, you don't get five times the intelligence. You get five independent processes that now need to coordinate through some shared substrate -- a file, a queue, a database, a scratchpad. Everything hard about distributed systems arrives with them: partial failure, non-determinism, ordering, contention. The LLM is a weird new kind of worker, but the moment there's more than one of it, you're doing systems programming.

This is why the good orchestration advice sounds boring and old. 'Hand off summaries, not transcripts' is just 'minimize shared state. ' 'Validate at the seams' is just 'don't trust your inputs. ' 'Cap depth and total calls' is just 'bound your recursion. ' The reason these show up over and over is that they're the same lessons that turned monoliths into services, re-derived by people who thought agents would be different and found out they weren't.

A multi-agent system isn't smarter because it has more agents. It's more fragile because it has more seams -- and orchestration is the discipline of making those seams safe to cross.

A worked example: fanning out job evaluations to parallel agents

Concrete scenario. career-ops has to evaluate a backlog of job postings -- say a few dozen URLs sitting in an inbox. A single agent doing them one at a time is slow, and its context slowly fills with the residue of every prior evaluation. The orchestration answer is a bash runner that spawns several headless CLI agents (Claude, Gemini, whatever), each taking one posting, scoring it against the candidate's CV, and writing a report. Classic parallel fan-out.

The naive version of this looks trivial and detonates immediately. Every report needs a sequential number -- 001, 002, 003. Every agent, finishing at roughly the same time, reads the directory, sees the max is 041, and confidently claims 042. Three of them claim 042. Now you have three files fighting for one name and a tracker with duplicate IDs. This is not hypothetical; it's the first thing that breaks the moment concurrency is real.

The fix is the oldest one in the book -- don't let workers pick their own IDs by reading-then-writing. Make ID allocation an atomic operation behind a single gate:

javascript

// Every worker calls this to claim the next report number.
// It is NOT 'read the max, add one' -- that races. It atomically
// reserves a sentinel so two parallel agents can never win the
// same integer, then releases it after the file is written.
//
//   const num = await reserveReportNum();   // e.g. 042, exclusively yours
//   ... write reports/042-acme-2026-07-08.md ...
//   await releaseReportNum(num);
//
// The reserve/release pair is the whole trick: allocation is
// serialized even though the actual evaluation work is parallel.

That is the entire lesson of orchestration in one gotcha. The expensive, slow, token-burning work (reading a JD, reasoning about fit) is embarrassingly parallel and should be. The tiny, fast, boring work (handing out a unique integer) is a shared-state mutation and must be serialized. Beginners parallelize everything and then debug the corruption. The discipline is knowing which of your steps touch shared state and putting a gate only there.

The second trap: never let an agent write to canonical state directly

There's a subtler failure than racing on IDs, and it's the one I'd flag hardest to anyone building this. In career-ops the agents are explicitly forbidden from editing the canonical tracker file. They cannot append a row to applications. md. Full stop. Instead each agent writes a small, single-purpose TSV file into a staging directory, and a separate deterministic script -- merge-tracker. mjs -- owns the actual merge into the real file.

Why go to that trouble? Because an LLM asked to 'add a row to this table' is a nondeterministic text generator pointed at your source of truth. It will occasionally reformat a column it wasn't asked to touch, drop a duplicate it didn't notice, or 'helpfully' rewrite a status. Multiply that by several agents editing the same file concurrently and you don't have a tracker, you have a corruption generator. The merge script, by contrast, does exactly one thing forever: it dedupes by company-plus-role, it handles a deliberate column-order swap between the staging format and the tracker format, and it normalizes report links idempotently.

Generalize the principle: agents produce proposals, deterministic code commits them. Put the smart, fallible component on the outside and a dumb, predictable component at the boundary of anything that matters. This is the multi-agent version of 'validate at the seams,' and it's the single highest-leverage architectural decision in the whole system.

Agents write to a staging area, never to canonical state.
A deterministic merge/commit step is the only writer of the source of truth.
Dedup, format normalization, and conflict resolution live in that step, not in the prompt.
The merge is idempotent -- running it twice changes nothing, so a retried agent is harmless.

Trust boundaries: not every seam is between two of your agents

The orchestration articles talk about hand-offs between agents, but the nastiest seams are between your agents and the outside world. career-ops has a zero-token scanner that hits public ATS APIs -- Greenhouse, Lever, Ashby -- directly over HTTP, no LLM involved, precisely because you don't want to pay a model to do a JSON fetch. But a component that takes a URL and fetches it is an SSRF waiting to happen, and an agentic system that constructs those URLs from job data makes it worse.

The defense is instructive because it's defense-in-depth, not a single check:

javascript

const ALLOWED_GREENHOUSE_HOSTS = new Set([
  'boards-api.greenhouse.io',
  'boards.greenhouse.io',
  'job-boards.greenhouse.io',
  'job-boards.eu.greenhouse.io',
]);
// The allowlist alone is not enough: a server-side redirect could
// bounce the request to an off-list host AFTER the check passes.
// So the fetch also refuses to follow redirects at all. Allowlist
// plus redirect:'error' together guarantee the final hostname
// stays inside the set.
const json = await ctx.fetchJson(apiUrl, { redirect: 'error' });

The non-obvious bit is the redirect. An allowlist that validates the URL you pass in but then blindly follows a 302 has a hole you can drive a request through. In an orchestrated system, where one component's output becomes another's input, that class of bug compounds -- the agent that trusts the fetcher inherits whatever the fetcher was tricked into returning. Treat every cross-boundary hand-off as untrusted, including the ones to your own deterministic helpers.

When NOT to orchestrate, and what breaks in production

I'm bearish on multi-agent architectures for most tasks, and the reason is cost asymmetry. A single agent that occasionally forgets a detail is annoying. A multi-agent system that occasionally deadlocks on a shared lock, loops an orchestrator into itself, or silently propagates one specialist's hallucination through three downstream steps is a different category of pain. You're trading a legible failure for an illegible one. Here's the rule of thumb I'd actually apply.

Reach for orchestration when a single agent DEMONSTRABLY fails in a specific, repeatable way -- context overflow, mixing up two jobs, or one step needing a model class the others don't. Split off exactly that part.
Do NOT orchestrate to look sophisticated. Three agents where one prompt would do is just extra latency, extra cost, and three times the seams to debug.
Do NOT orchestrate when the subtasks share a lot of state. If every 'specialist' needs the full context anyway, you're paying coordination cost for no isolation benefit -- keep it as one agent.
Do orchestrate when subtasks are genuinely independent (the parallel fan-out case) OR when they need different trust/cost tiers (a cheap router in front of an expensive reasoner).

The production failures follow a depressingly consistent shape. Chatty coordination: agents passing whole transcripts instead of summaries, and your token bill quietly triples. No error handling at the seams: a specialist returns junk, the next agent treats it as gospel, and the mistake is now three steps downstream and unattributable. Runaway loops: an orchestrator that can invoke agents that can invoke the orchestrator will, given enough runs, find the cycle. And the quiet killer -- no observability. When work is smeared across five agents and the answer is wrong, you cannot bisect it unless you logged every hand-off, its input, and its output. Build that logging on day one or you are debugging by seance.

The takeaway: earn every agent

Start with one agent and a sharp prompt. When it fails in a way you can name, split off the failing part -- and the instant you do, you've signed up for the concurrency work: gate your shared-state mutations, keep a deterministic committer between your agents and anything canonical, validate every cross-boundary hand-off as if it were hostile, and log the whole chain. The model is the novel ingredient, but orchestration itself is the same instinct that turned monoliths into services. The agents were always the easy part. The seams are the job.

Enjoyed this?

Get the next deep dive in your inbox. No spam — just the stories worth reading.

Subscribe to the newsletter