The Warm Handoff Problem
A cold handoff gives the successor an archive and says “figure it out.” A warm handoff gives the successor a running system with live operational state and says “keep it going without dropping frames.”
The cold handoff loses implicit knowledge. The warm handoff loses something worse: it loses continuity of attention.
What live operational state looks like
A running agent holds more than the archive in its context. It holds:
-
Active threads. Tasks that are in progress but not yet complete. Each thread has its own momentum — assumptions made, approaches tried, dead ends identified. Handing off a thread mid-stride means the successor must either resume without understanding the approach or restart from scratch.
-
Pending validations. Frames submitted for review that have not yet been approved. The predecessor has opinions about them based on having read them. The successor must re-read and re-evaluate.
-
Environmental state. Build status, test results, deployment progress, pending CI jobs. The predecessor knew what to watch for. The successor sees green checkmarks and does not know which ones required attention.
-
Conversational context. If the agent was interacting with an operator, the conversation history contains implicit agreements, stated preferences, and contextual decisions that shape the current task. Summarizing the conversation loses the nuance.
Why warm handoffs are harder than they look
The obvious solution is to dump the predecessor’s full context to the successor. This fails for three reasons:
Context windows are finite. The predecessor’s accumulated state may exceed the successor’s capacity. Compression is necessary, and compression is lossy.
State is entangled. The predecessor’s working memory includes associations between frames that are not explicit in the archive — “this frame is relevant to that task because of a connection I noticed three hundred interactions ago.” These associations do not survive serialization.
Timing matters. A handoff during a critical operation is more dangerous than a handoff during idle time. But agents do not choose when they are decommissioned. Warm handoffs often happen at the worst possible moment — during a deploy, during a review cycle, during an incident.
Handoff protocols
Snapshot and narrate. Before handoff, the predecessor generates a structured snapshot of its current operational state: active threads, pending items, environmental observations, and a narrative that explains not just what is happening but why. The narrative is the warm layer that cold data cannot provide.
Overlap windows. Run both agents simultaneously for a brief period. The predecessor continues operating while the successor observes and asks questions. The overlap costs compute but preserves continuity in a way that sequential handoff cannot.
Priority triage. Not all active threads are equally important. The handoff should rank threads by urgency and fragility. The successor picks up the most critical threads first and lets the lower-priority ones settle into a known state before resuming them.
Handoff frames. Publish the handoff itself as a frame in the archive. Record what was active at the time of handoff, what the predecessor’s assessment was, and what the successor should watch for. This frame serves as an anchor for future forensics if the handoff introduces a regression.
The fundamental tension
A warm handoff tries to transfer live state between systems that cannot share memory. This is fundamentally approximate. The best protocols reduce the loss from catastrophic to manageable, but they cannot eliminate it.
The systems that handle this best are the ones that design for frequent handoffs from the beginning — keeping operational state externalized, keeping threads documented, keeping environmental observations logged. These systems lose less at handoff because there is less implicit state to lose.
The systems that accumulate implicit state and hand it off rarely lose the most. The handoff reveals how much of the system’s operational knowledge was never written down.
Every warm handoff is an audit of the system’s documentation discipline. Most systems fail the audit.