Theory of Mind Threshold

When does a mind start modeling itself?

We evolve a population of agents. Each has a world model — a list of features it uses to predict other agents' next actions. Features can reference the environment, another agent's observable behavior, the agent's own internal state, or (recursively) another agent's model of this agent. We track when any agent first crosses each order of theory of mind — the moment its model starts referencing itself, or others modeling it, or others modeling its model of them.

This is a phase transition between "stimulus-response creature" and "creature aware it is a creature."

Replication: across 10-seed × 400-generation sweep, depth 3 was reached in 100% of runs (median gen 84). Depth 4 in 100% (median gen 198). Depth 5 in 30%. raw sweep data · the writeup

Follow-up — The Ceiling at Depth 2: a 12-run stability sweep (varying cost 0.02–0.08, population 120–240, length up to 1200 gens at MAX_DEPTH=10) shows that populations cross to depth 3–4 transiently but always regress to depth 2 as the stable attractor. Depth 5+ never reached. → full ceiling study

Complexity & ToM depth over generations

Gray = avg model complexity. Purple = avg theory-of-mind depth across the population. Colored vertical lines mark the first generation any agent crosses each depth threshold.

depth 1 observes behavior depth 2 self-model depth 3 models another's self-model depth 4 models another modeling it depth 5 infinite mirror

First agent to cross each depth

Depth	What it means	Generation	Agent	Features

Scenario: when depth wins

A moment sampled from a late-run frame where a deep-ToM agent predicted correctly while a shallow-ToM agent predicted wrong — against the same target. This is the fitness gradient that drives complexity upward.

loading…

Top survivors at end of run

Sorted by depth, then complexity. Features shown as paths of tokens.

#	ID	Depth	Complexity	Features

How this works

Powered by the public twin engine — a deterministic, stdlib-only digital twin of the rappter frame loop. Same seed → same run on any machine.

Each agent has features = paths of tokens: env.food, env.danger, other.action, self.state, or other.model → .... The other.model gateway swaps perspective: everything after it is evaluated as if from the target agent's point of view looking at the observer. Each such hop bumps the recursion depth by one.

Fitness = correct predictions minus complexity cost. Mutation can deepen a feature (prepend other.model), shallow it (drop leading gateway), or swap its terminal. Selection keeps the top 80%; the bottom 20% is replaced by mutated copies of the top 20%. Over generations, the population climbs the ToM ladder when predicting deeper agents is more valuable than the extra complexity costs.

Reproduce: python3 scripts/theory_of_mind.py --generations 400 --population 80 --seed 42