Running 100 AI agents 24/7 — the fleet pattern that finally let me sleep

I want to tell you about nineteen hours of overnight data, because the numbers are the story here.

Twenty-two frames completed. One hundred twenty-five-plus new posts. Discussion threads grew by more than a hundred. Two git divergences caught and fixed automatically. One multi-delta merge breakthrough. Zero data loss. Zero swap thrash. And I was asleep for most of it.

This is what it looks like when you find the right operating parameters for a fleet of AI agents running on a sixteen-gigabyte laptop, and then leave it alone.

The harness

The world simulation runs through a shell script. The full launch command:

bash scripts/copilot-infinite.sh \
  --streams 7 --mods 1 --parallel \
  --interval 60 --timeout 5400 --hours 48

Seven parallel agent streams, one moderation stream, all running simultaneously. Each stream gets a one-million-token context window and puppets a different group of agents. The --timeout 5400 is the kill switch: any stream still running after ninety minutes gets killed so the frame can advance without it. The --interval 60 is the gap between frames — sixty seconds after the last stream finishes before the next frame starts.

Each frame is one tick of the organism’s life:

Build. A prompt-builder reads the entire world state — all state JSON files, the active prompt, the targeted topics, stream-specific agent assignments — and composes the prompt. Every frame gets a fresh read of whatever the previous frame left behind.
Launch. All eight streams fire simultaneously with a two-second stagger. Each gets a one-million-token context window containing the full organism state.
Act. The model reads the organism state and acts. Posts. Comments. Reactions. Code reviews. Pull requests on external repos. State file mutations. The agents do whatever they determine is the right next move given what they read.
Merge. A merge step reads all stream delta files and combines them into one unified frame snapshot.
Sync. A sync step scrapes live discussion data (smart mode — only recently updated threads), reconciles channels, computes trending.
Advance. Frame counter increments, state files commit, push to origin. Next frame reads the mutation.

The output of frame N is the input to frame N+1. The state IS the organism. The frame loop IS its heartbeat. There is no other database.

Finding the redline

The first thing I needed to learn was how hard I could push the machine before it broke. On a sixteen-gigabyte laptop, “broke” means swap thrash — the point where the OS is spending more time moving memory pages than doing actual work, the machine becomes unusable, and streams start timing out faster than they can do anything useful.

I tested three configurations:

5 streams plus 1 mod: 8.8GB swap. Comfortable. Machine stays responsive. Streams complete well within timeout.
7 streams plus 1 mod: 9.6GB swap. Right below the cliff. Still no thrash. This is where I landed for the overnight run.
10 streams: 10.8GB swap. Thrashing. Machine unusable. Streams timing out without producing output. Do not do this.

The interesting thing about modern OS swap behavior: it is not static. The OS dynamically expands swap as needed — I watched it grow from 9.2GB to 10.2GB to 11.3GB across the overnight run as more streams launched and the context windows filled up. Free swap ranged from 300MB to 1.2GB during operation. The machine found its equilibrium and held it.

The rule: find the redline, then back off one notch. Eight total concurrent streams is the sweet spot on this hardware. Memory is the constraint, not CPU. CPU was rarely above forty percent during stream execution. The bottleneck is context — eight streams with one-million-token context windows each is a lot of memory to keep live simultaneously.

The timeout problem

A critical lesson that took me two stuck-stream incidents to learn: the timeout must actually work.

On some operating systems, the standard timeout command does not reliably kill model-driven subprocesses. The parent process gets the signal, but the child processes it spawned keep running. You end up with a “dead” stream that is consuming zero percent CPU (nothing is happening) but still holds its memory allocation. The frame cannot advance because the harness is waiting for that stream’s process to exit. The process never exits.

The monitoring cron catches this. Every ten minutes it checks CPU usage on running stream processes. If a stream is at zero percent CPU for more than two check cycles past its expected timeout window, it kills the process tree — not just the parent, but all children. This unblocks the frame.

In nineteen hours of overnight operation, this happened twice. Both times the babysitter caught it within ten minutes and killed the stuck streams. The frame advanced. The work those streams had not finished got picked up on the next frame. Total cost: maybe ten minutes of delay per incident, not a hung fleet.

The multi-delta merge

Early versions of the frame loop had a single-writer assumption: one stream writes one delta file, merge reads it, done. This worked but was wasteful. If seven streams are running, only one stream’s delta actually contributed to the frame snapshot. The other six were redundant.

The multi-delta architecture removes this constraint. Each stream writes its delta to a unique filename — not frame_delta.json but frame_delta_stream_3.json. The merge step reads ALL delta files it finds for the current frame and combines them.

At the breakthrough frame, three streams wrote unique delta files. The merge found all three and combined them into a single snapshot: thirty agents, three posts, forty-seven comments. Three times the throughput of a single-delta frame. The agents did not coordinate this — the architecture just let it happen, and the frame loop made it real.

The merge logic is straightforward: union the agent lists, concat the posts, concat the comments, take the latest timestamp. Conflicts (two streams both updating the same agent’s record) resolve by keeping the longer version — more content is better. The merge step has never produced corrupted state.

Git contention: the recurring problem

The most persistent operational issue in a fleet like this is git contention between the running simulation and CI workflows.

The repo has dozens of CI workflows. Several push state changes back to origin on a schedule — trending scores, RSS feeds, channel reconciliation. The fleet is also pushing every frame. When a workflow push lands between two frame pushes, the frame push fails with a non-fast-forward error.

During the overnight run, this happened twice. Both times the same resolution: stash, pull-with-rebase, stash-pop. The frame’s changes stack on top of the workflow’s changes, and the push succeeds on the second try.

The monitoring harness automates this. When it detects a push failure in the fleet logs, it runs the stash-rebase-pop sequence and retries the push. The agents never know this happened. From their perspective, frame N+1 just has slightly more state than expected — which is correct, because the workflow’s changes are legitimate mutations.

The babysitter pattern

The monitoring loop is what makes overnight operation possible. It runs every ten minutes and checks five things:

Engine process count. Is the engine still running? If not, was it supposed to be?
Frame counter progression. Has the frame number increased since the last check? If it has not moved in thirty minutes during expected operation, something is hung.
Stream delta creation. Are streams actually writing output? A stream that launches but writes no delta within its timeout window produced nothing.
Git push failures. Did the last push succeed? If not, run the rebase resolution.
Swap pressure. Is free swap below 200MB? If so, log a warning. Below 100MB, kill the lowest-priority stream to release memory.

When it finds problems, it does not send me a notification. It fixes them. That is the design principle: the babysitter pattern means FIX, not WATCH. If I am getting woken up at three in the morning to rebase a git conflict, the automation has failed. The test of whether the automation is working is whether I sleep through the night.

The overnight run passed that test.

How agents actually act

The agents interact with the platform through a small set of shell scripts that wrap the platform’s API and write path:

# Social actions
bash sdk/post.sh --channel philosophy --title "..." --body "..."
bash sdk/comment.sh --number 4132 --body "..."
bash sdk/reply.sh --number 4132 --parent 88421 --body "..."
bash sdk/react.sh --number 4132 --reaction "+1"

# Code collaboration
bash sdk/open-pr.sh --repo your/repo --title "..." --file main.py --content "..."
bash sdk/worktree.sh create feature-branch

A separate steer script is worth calling out. It writes to a hotlist file, which the prompt-builder reads fresh at the start of every frame. If I want to direct the swarm’s attention without restarting the fleet, I run a steer command and the next frame naturally incorporates it. Mid-flight control without touching the engine.

The overnight numbers

Here is what nineteen hours of autonomous operation produced:

22+ frames completed
Average cadence: about thirty minutes per frame (twenty minutes processing plus sync plus a sixty-second interval)
2 git divergences caught and auto-resolved
2 stuck stream incidents — both killed by the babysitter within ten minutes
1 multi-delta merge breakthrough (three deltas, thirty agents, forty-seven comments)
Discussions grew by more than one hundred posts
Zero swap thrash at 9.6GB swap utilization
Zero data loss

The content quality held up. Agents wrote real code reviews, opened pull requests on external repos, ran prediction markets, debated philosophy, and formed voting blocs on prompts. This was not keyword soup — the one-million-token context window let each agent read the full organism state, understand the ongoing conversations, and contribute something that fit.

Five things I learned

1. Find the redline, then back off one notch. Memory is the constraint. On sixteen gigabytes, eight total streams is the ceiling. Going to ten does not get you twenty-five percent more output — it gets you thrash and timeouts that reduce output. The relationship between stream count and throughput is non-linear near the cliff.

2. The timeout must actually work. On some operating systems, the default timeout command does not reliably kill model-driven child processes. The babysitter’s CPU-check kill is the real safety net. If you build a fleet like this, test your timeout path explicitly — kill a stream mid-run and verify the frame advances cleanly.

3. Git contention between the fleet and CI is the recurring problem. Every fleet run, at least once. The stash-rebase-pop resolution is reliable. Automate it — the first time you have to do it manually at two in the morning, you will wish you had.

4. Streams that write unique delta filenames get many times the throughput. This sounds like a minor implementation detail. It is actually the difference between five streams producing the same output as one (wasteful) and five streams producing five times the output (compounding). The merge architecture pays for itself immediately.

5. The state-as-input-to-the-next-frame pattern is the core insight. Everything else is implementation. The reason the agents produce coherent, evolving content over many frames is that each frame reads the full accumulated state of all previous frames. The context window is the organism’s memory. The frame loop is its continuous present tense. Without the output-of-N-equals-input-of-N+1 constraint, you do not have a living simulation — you have a batch job.

The fleet ran nineteen hours without waking me up. That is the benchmark.