How do you fairly compare two autonomous systems?
If System A manages a warehouse in Phoenix and System B manages one in Seattle, you can't compare their energy efficiency. The environments are different. If System A runs in summer and System B runs in winter, same problem. Different inputs, incomparable outputs.
The solution: give everyone the exact same inputs and compare their outputs.
Each frame is a self-contained unit of environmental conditions for one time step. In our Mars sim, it's one sol (Mars day). In your domain, it's whatever your tick interval is.
// One frame = one sol of Mars conditions
{
"sol": 47,
"utc": "2025-08-25T14:30:00Z",
"mars": {
"temp_k": 218,
"dust_tau": 0.15, // optical depth
"solar_wm2": 480, // solar irradiance
"wind_ms": 4.2,
"pressure_pa": 740,
"season": "Northern Summer"
},
"events": [
{ "type": "dust_devil", "severity": 0.3, "duration_sols": 0.5 }
],
"hazards": [
{ "type": "micrometeorite", "probability": 0.02 },
{ "type": "equipment_fatigue", "target": "solar_array", "degradation": 0.005 }
],
"challenge": {
"type": "solar_tracking_fault",
"params": { "misalignment_deg": 12.4, "dust_factor": 1.3 }
},
"_hash": "a1b2c3d4e5f6g7h8" // SHA-256 of frame content
}
The _hash is computed from the frame content before the hash is inserted. This makes the frame tamper-evident — if you modify the temperature to make your colony survive longer, the hash won't match the public ledger.
// Client fetches and consumes frames
async function loadPublicFrames() {
const manifest = await fetch(FRAME_BASE + '/manifest.json').then(r => r.json());
latestPublicSol = manifest.last_sol;
frameMode = 'public';
// Pre-load first batch for immediate play
for (let s = 1; s <= 20; s++) {
publicFrames[s] = await fetch(FRAME_BASE + `/sol-${pad(s)}.json`).then(r => r.json());
}
}
function stepSim() {
state.sol++;
// If public frames available, use them
if (frameMode === 'public') {
const frame = publicFrames[state.sol];
if (frame) {
// Override weather with frame data
marsWeather = { ...marsWeather, ...frame.mars };
// Inject events
frame.events.forEach(ev => state.events.push(ev));
// Apply hazards
frame.hazards.forEach(applyHazard);
// Pre-fetch ahead
ensureFramesAhead(state.sol);
}
}
// Rest of sim tick — player's management decisions
// Production, consumption, crew health, etc.
}
Here's where it gets philosophically interesting. When a player catches up to the latest available frame, their system doesn't just stop. It breathes on its own.
The echo inertia from the last frame provides standing orders. The reflex arcs continue firing based on current state derivatives. The policy VM keeps making allocation decisions. Resources tick based on last-known production rates.
When the next frame arrives — whether that's in 6 hours or 6 days — it's a fresh environmental signal. The system reacts. Inertia recomputes. Reflexes re-evaluate. The organism gets a new heartbeat.
The cortex sleeps between frames, but the brainstem never stops. The colony is alive even when nobody is looking at it.
This creates a natural tension for livestreaming: the colony is running on autopilot, slowly consuming reserves, and the next frame could bring a dust storm that changes everything. Will they survive? Tune in when the next frame drops.
The entire competitive system runs with zero backend infrastructure:
No servers to run. No databases to maintain. No authentication to build. The git repo IS the backend. Static files ARE the API.
This architecture scales to thousands of concurrent players because each player is an independent client consuming static files. There's no shared state, no real-time coordination, no multiplayer sync. Everyone fetches the same immutable JSON files and processes them locally.
The only bottleneck is GitHub Pages bandwidth, which handles billions of requests per month. If you need more, mirror the frames to any CDN — they're just static JSON files.
This pattern is directly applicable to AI agent benchmarks. Today, we compare agents on tasks with different random seeds, different orderings, and different conditions. Results aren't reproducible.
Imagine an AI benchmark where every agent faces the exact same sequence of environmental frames: same API responses, same user queries, same error conditions. The only variable is the agent's reasoning and tool use.
That's what competitive frame consumption gives you. Publish the environment. Let agents consume it. Compare outcomes. Pure skill, zero luck.
Competitive Frame Consumption is Pattern 07 in the Rappter Pattern Library. It composes with Sim Cartridges (Pattern 05) and the Public Frame Ledger (Pattern 06).
Fair competition requires controlled inputs. Publish the environment. Let them compete on management.