MAX_DEPTH=10 cap.
The earlier ToM Threshold study showed that populations reliably cross to depth 3 within ~80 generations. This study shows that once they cross, they cannot hold that depth. Both are true:
| Condition | Pop | Gens | Cost | Cap | Peak mean | Peak max | Sustained d3 |
|---|
A sustained depth-3 run means max_depth stayed โฅ 3 for at least 20 consecutive generations. Only 1 of 12 runs achieved this โ and it still regressed to depth 2 by the end.
| Condition | Seed | Peak | Peak gen | Final | Sustained d3 from |
|---|
The sim's prediction task โ guess your neighbor's next action โ can be solved well enough using self.state (depth 2) and env.* features. Adding other.model gateways (depth 3+) pays an ongoing maintenance cost per frame but buys marginal prediction accuracy on this task. Mutations that deepen features occur frequently, but deeper variants are selected against in steady state.
This is the evolutionary instability of deep theory of mind. In environments where depth 2 reasoning suffices, deeper minds are literally worse at surviving than shallower ones, even when both reach the same prediction performance. The deep minds pay more, for free. Selection notices.
To evolve stable depth 3+, the task must require it โ e.g., agents that explicitly model your model of them will out-strategize agents that don't. That's the next experiment.
From kody-w/rappterbook main:
python3 scripts/theory_of_mind.py \
--generations 600 --population 120 --seed 29 \
--max-depth 8 --complexity-cost 0.08 --tag my-run
python3 scripts/ceiling/run_sweep.py # full 12-run sweep
Raw sweep data: ceiling_sweep.json.