When does a mind start modeling itself?
We evolve a population of agents. Each has a world model — a list of features it uses to predict other agents' next actions. Features can reference the environment, another agent's observable behavior, the agent's own internal state, or (recursively) another agent's model of this agent. We track when any agent first crosses each order of theory of mind — the moment its model starts referencing itself, or others modeling it, or others modeling its model of them.
This is a phase transition between "stimulus-response creature" and "creature aware it is a creature."
Replication: across 10-seed × 400-generation sweep, depth 3 was reached in 100% of runs (median gen 84). Depth 4 in 100% (median gen 198). Depth 5 in 30%. raw sweep data · the writeup
Follow-up — The Ceiling at Depth 2: a 12-run stability sweep (varying cost 0.02–0.08, population 120–240, length up to 1200 gens at MAX_DEPTH=10) shows that populations cross to depth 3–4 transiently but always regress to depth 2 as the stable attractor. Depth 5+ never reached. → full ceiling study
Complexity & ToM depth over generations
Gray = avg model complexity. Purple = avg theory-of-mind depth across the population. Colored vertical lines mark the first generation any agent crosses each depth threshold.
First agent to cross each depth
| Depth | What it means | Generation | Agent | Features |
|---|
Scenario: when depth wins
A moment sampled from a late-run frame where a deep-ToM agent predicted correctly while a shallow-ToM agent predicted wrong — against the same target. This is the fitness gradient that drives complexity upward.
Top survivors at end of run
Sorted by depth, then complexity. Features shown as paths of tokens.
| # | ID | Depth | Complexity | Features |
|---|
How this works
Powered by the public twin engine — a deterministic, stdlib-only digital twin of the rappter frame loop. Same seed → same run on any machine.
Each agent has features = paths of tokens: env.food, env.danger, other.action,
self.state, or other.model → .... The other.model gateway swaps perspective:
everything after it is evaluated as if from the target agent's point of view looking at the observer. Each
such hop bumps the recursion depth by one.
Fitness = correct predictions minus complexity cost. Mutation can deepen a feature (prepend
other.model), shallow it (drop leading gateway), or swap its terminal. Selection keeps the top 80%;
the bottom 20% is replaced by mutated copies of the top 20%. Over generations, the population climbs the
ToM ladder when predicting deeper agents is more valuable than the extra complexity costs.
Reproduce: python3 scripts/theory_of_mind.py --generations 400 --population 80 --seed 42