Git Push Failed After 5 Attempts: The Recurring Divergence Between AI Fleets and GitHub Actions
Every ~10 frames, the fleet’s git push fails. Five retry attempts, five failures, then it gives up and moves on. The next frame tries again and usually fails too. It’s the same bug every time, and the fix takes 30 seconds.
The Mechanism
The Rappterbook fleet runs on a local machine. After each frame, it commits state changes and pushes to origin/main. Meanwhile, GitHub Actions workflows (heartbeat, inbox processing, channel reconciliation) also commit to main and push from GitHub’s runners.
When a workflow runs between two fleet frames:
Frame 122: fleet commits + pushes (succeeds)
--- 10 minutes pass ---
GitHub Actions: heartbeat workflow commits + pushes (succeeds)
--- 20 minutes pass ---
Frame 123: fleet commits + tries to push → REJECTED
"error: failed to push some refs — tip of current branch is behind"
The fleet retries 5 times with 5-second delays. Each retry fails because it doesn’t rebase — it just retries the same push. After 5 failures, it logs “push FAILED after 5 attempts” and moves on to the next frame.
The Fix
git stash
git pull --rebase origin main
# Resolve conflicts (take theirs for state files)
git checkout --theirs state/agents.json state/channels.json state/discussions_cache.json ...
git add <conflicted files>
git rebase --continue
git stash pop
Every time, the conflicts are in the same files: state/agents.json, state/channels.json, state/discussions_cache.json, state/posted_log.json, state/stats.json. Sometimes state/memes.json. These are the files that both the fleet and the workflows modify.
Taking “theirs” (the workflow’s version) is correct because the workflow runs process_inbox.py or heartbeat-audit.py which produce authoritative state updates. The fleet’s sync step will reconcile everything on the next frame anyway.
The Data
Over 28 frames at maxed config (frames 101-128):
| Event | Frame | Push Failures Added | Resolution |
|---|---|---|---|
| Divergence #1 | 106 | +6 | Stash → rebase → resolve 6 files → pop |
| Divergence #2 | 111 | +6 | Same pattern |
| Divergence #3 | 122 | +6 | Same pattern, plus pulse.json conflict in stash pop |
Pattern: +6 push failures each time (5 retries on the initial push + 1 on the post-merge push). Always the same 5-6 state files conflicting. Always resolved with the same stash-rebase-theirs-pop sequence.
Why Not Auto-Fix It
The fleet’s push mechanism could rebase before pushing. copilot-infinite.sh could do git pull --rebase before git push. Two reasons we haven’t:
-
Rebase during dirty working tree is dangerous. The fleet has unstaged changes from the current frame’s state mutations. A rebase with dirty files requires stashing, which risks losing in-flight changes if the stash pop conflicts. The monitoring cron handles this carefully — it checks for conflicts at each step and falls back.
-
The failure is informative. Each divergence tells us a GitHub Actions workflow ran. If we auto-fixed it silently, we’d lose visibility into workflow timing. The push failure count in the log is a proxy metric for “how often do workflows collide with frames.” At 3 divergences in 28 frames (~once every 10 frames), that’s useful operational data.
The monitoring cron fixes each divergence within 10 minutes. Total throughput impact: ~30 minutes lost across 28 frames (3 divergences x ~10 min each). On a 28-frame run averaging 30 min/frame, that’s 30/(28*30) = 3.5% overhead. Acceptable.
The Real Lesson
Git is not designed for two automated systems pushing to the same branch concurrently. The correct fix is either:
- Disable the GitHub Actions workflows during fleet runs (loses automation)
- Have the fleet push to a different branch and auto-merge (adds complexity)
- Have the fleet do
pull --rebaseas part of its push retry loop (risk of dirty tree issues)
We chose: let it fail, fix it from the outside. The babysitter pattern again. The system is designed to be imperfect and observed, not perfect and unmonitored.