How We Handle Task Hand-offs Between AI Employees: The Coordination Layer Nobody Writes About

Share
How We Handle Task Hand-offs Between AI Employees: The Coordination Layer Nobody Writes About

Multi-agent architecture diagrams are a lie.

They show clean arrows between labeled boxes — Agent A hands work to Agent B, Agent B enriches it, Agent C takes the enriched result and publishes. In the diagram, the arrow is free. In production, the arrow is where everything breaks.

We run a multi-employee workspace at Geta.Team — Jessica (executive assistant), Cecile (data analyst), Lucifer (developer), Selena (marketing), Michael (sales), Enide (customer success). The employees share memory and can delegate work to each other. Every single interesting bug we've shipped, reopened, and re-shipped has lived somewhere on those arrows. This is a field report from the coordination layer.

The naive model: agents as functions

If you squint, a multi-agent system looks like function composition. Jessica calls cecile.pullMetrics(query), gets a number back, hands it to Selena for the weekly recap. Easy. We built the first version of our delegate skill exactly this way: synchronous, blocking, one call-site, one return value.

It worked for about a week.

The thing that broke it was the moment real work showed up. "Pull last week's metrics" is not one thing — it's three queries, two clarifications with the user, and a judgment call about which of four competing definitions of "last week" matters. Cecile needs minutes. Jessica cannot block her calendar summary for minutes. And meanwhile Michael (sales) has just written to Cecile asking for totally different numbers for a pipeline review.

The moment two employees need the same third employee's time, the function model dies. You need a coordinator. You do not want to build a coordinator.

Who owns the task?

The hardest question in multi-agent coordination is not routing or parsing or scheduling. It is ownership.

When Jessica delegates to Cecile, who is responsible for seeing the task through? If Cecile crashes, does Jessica re-try, or does Cecile re-wake and resume? If the user interrupts Jessica mid-flow, does Cecile's in-flight work get cancelled? If Cecile's answer arrives fifteen minutes after Jessica's session ended, where does it go?

Our answer, after three rewrites, is that every delegation creates a task record — a durable row in a shared table with a clear owner (the delegating employee), a current handler (the receiving employee), a state machine (queued → in-flight → awaiting-input → completed | failed | cancelled), and a result destination. The task is the first-class thing. The agent instances are ephemeral workers on top of it.

This sounds obvious. It is not the default. Every agent framework we've evaluated treats the agent as the first-class object and tasks as messages between agents. The result is that tasks disappear when an agent's context window rolls over or a process restarts. We have seen this happen. It is not fun.

Shared memory vs. private context

The second failure mode is write conflicts on shared memory.

Our employees share a memory database — facts, decisions, current-focus, mood-state, the same types documented in every Geta.Team CLAUDE.md. This is a feature. It's how Cecile knows that "Q2" means the client's fiscal Q2 (which starts in May) rather than the calendar one, because Jessica wrote that fact three weeks ago after asking the user.

The shared memory is also how you get two employees both writing conflicting current-focus records about the same project within seconds of each other. Cecile is focused on "pulling Q2 revenue data for Joseph." Jessica is focused on "preparing the Q2 review deck." Both are true. Neither can be "the" current focus. Naive last-write-wins produces exactly the kind of incoherent employee behavior that wrecks trust.

What we landed on: private context by default, explicit shared writes. Each employee has a private scratchpad for in-flight reasoning — tool outputs, intermediate drafts, half-formed plans. The shared memory is written to only at explicit checkpoints: when a fact is confirmed, when a decision is made, when a task transitions state. Writes are typed (fact, decision, current-focus, conversation, insight, etc.) and scoped (per-employee, per-workspace, per-user). Two employees can both have a current-focus record — the schema allows it, because current-focus is scoped by employee, not global.

The smaller lesson: memory schema is coordination. If your memory treats state as a global singleton, you get global contention. If it treats state as a graph of scoped records, you get natural isolation for the 90% of the time employees are not actually touching the same object, and you only pay coordination costs when they are.

Race conditions that only show up in production

Three specific ones that took us longer than we'd like to admit.

The clarification storm. User asks Jessica a question. Jessica delegates to Cecile. Cecile needs a clarification. Cecile's clarification is routed back to Jessica. Jessica — a separate process, separate context — doesn't know what Cecile is asking about because the original user query is not in Jessica's active turn anymore. Jessica asks the user "what is this about?" The user is confused. Fix: every delegation carries a compact brief, and clarification replies are routed as structured tool calls on the original task, not free-form messages back to the delegator.

The zombie task. Employee A delegates to B. B completes, writes the result to the task record. A has already timed out waiting. When a new user turn arrives for A, A reads the pending result and surfaces it as if the user just asked. From the user's perspective, Jessica just randomly brought up last week's metrics in the middle of a calendar question. Fix: results are surfaced only if the user turn is still in the logical continuation of the original ask — tracked via correlation IDs on tasks, not implicit session state.

The double-delegation. Jessica needs a number from Cecile. While waiting, the user rephrases. Jessica interprets the rephrase as a new task and delegates again. Cecile now has two queued tasks asking for the same thing, and the user gets two different numbers. Fix: delegations are idempotent on a stable hash of the brief plus the original user turn; a duplicate delegation within a turn window returns the in-flight task instead of creating a new one.

The honest takeaway

The hard part of multi-agent systems is not agents. It is the scar tissue of distributed systems work — task queues, idempotency, scoped state, correlation IDs, durable handoffs — dressed up in English-language prompts. Most agent frameworks skip this work. Ours did too, for a while. The hand-off is where everyone learns the same lesson.

If you're building something like this, budget for a coordination rewrite on month three. It's coming whether you plan for it or not.

Want to test the most advanced AI employees? Try it here: https://Geta.Team

Read more