Load-Bearing Data: The Concept That Connects PageRank, DNA, and the Wildfeuer Maneuver
Load-Bearing Data: The Concept That Connects PageRank, DNA, and the Wildfeuer Maneuver
Kody Wildfeuer – March 28, 2026
Disclaimer: This is a personal project built entirely on my own time. I work at Microsoft, but this project has no connection to Microsoft whatsoever – it is completely independent personal exploration and learning, built on personal infrastructure with personal resources.
The Wall Analogy
In architecture, a load-bearing wall holds up the building. The floors above rest on it. The roof transfers its weight through it. Remove it and the structure above collapses.
Most walls are not load-bearing. You can knock them out during a renovation – move the kitchen, open the floor plan, add a window. The building doesn’t care. The skill of a structural engineer is knowing which walls carry weight and which are just partitions. Get it wrong in one direction and you won’t renovate when you should. Get it wrong in the other direction and the ceiling comes down.
Data has the same property.
Most data is decoration. You can change it, delete it, reshape it, and nothing downstream notices. But some data is load-bearing: other systems read it, depend on it, build on top of it. Change it and everything downstream collapses. The database row that a thousand foreign keys point to. The API endpoint that ten thousand clients call. The simulation fact that three hundred subsequent frames reference.
The concept is universal. It applies to databases, APIs, simulations, scientific literature, legal systems, genetic code, and the architecture of the internet itself. Once you see it, you see it everywhere.
What Makes Data Load-Bearing
A piece of data becomes load-bearing the moment something else depends on it.
Not “uses” it – depends on it. The distinction matters. If I read a blog post and find it interesting, I used it. If I write a blog post that quotes it, links to it, and builds an argument on top of its thesis, the original post is now load-bearing for my post. My argument collapses without it.
The dependency creates the weight. And the weight is proportional to the number of downstream dependents. A database column referenced by one foreign key carries a little weight. A column referenced by every table in the schema carries the weight of the entire system.
Three properties define load-bearing data:
1. Downstream systems reference it by identity. Not by content – by identity. They point to it. They name it. They include its identifier in their own structure. A foreign key points to a primary key. A citation points to a DOI. A git commit references a parent hash. A legal ruling cites a precedent by case number. The reference is structural, not incidental.
2. Changing it would require changing the dependents. This is the load test. If you change the data and nothing else needs to change, it wasn’t load-bearing. If you change it and a cascade of downstream modifications is required to restore consistency, it was. The size of the cascade is the weight.
3. The dependents may not know they depend on it. This is the dangerous part. In a microservice architecture, Service A might depend on the response format of Service B without Service B knowing. In a simulation, frame 200 might reference a fact from frame 50 that the author of frame 50 never intended to be permanent. Load-bearing status is conferred by the dependents, not declared by the source.
The Taxonomy: Six Domains
Load-bearing data appears in every system where data has downstream consumers. Here are six domains, each illustrating a different facet of the concept.
1. Databases: The Primary Key
The simplest and most familiar example. A user’s ID is load-bearing. Their display name is not.
Foreign keys in a relational database are explicit load-bearing declarations. When table B has a column user_id REFERENCES users(id), it is declaring: “this data in table A is load-bearing for me.” The database enforces this with referential integrity constraints. Try to delete a user whose ID appears in the orders table and the database will refuse. It knows the data is load-bearing because the schema says so.
This is the happy case. The dependencies are declared, enforced, and visible. You can query the schema and build a complete dependency graph. You know exactly which data is load-bearing and how much weight it carries.
Most systems are not this clean.
2. APIs: The Endpoint Contract
An API endpoint’s URL is load-bearing. Its response schema is load-bearing. The description in the docs is not.
When you publish GET /api/v2/users/{id} and a thousand clients integrate against it, the URL, the HTTP method, the parameter format, and the response shape are all load-bearing. Change the URL and a thousand integrations break. Change the response format and a thousand parsers fail. Change the description in the API docs and nobody notices until they read the docs again.
This is why API versioning exists. It’s a formal acknowledgment that the old endpoint is load-bearing. You can’t change it – too many downstream systems depend on it. So you create a new endpoint alongside it. The old one stays forever, carrying its weight, because removing it would bring down everything built on top.
API deprecation schedules are load-bearing analysis made explicit: “We’ve measured the weight on this endpoint. It’s too heavy to remove today. In 18 months, we’ll have migrated enough dependents to reduce the weight below the threshold. Then we remove it.”
3. Simulations: The Downstream Fact
This is where load-bearing data gets interesting, because the dependencies are implicit and discovered after the fact.
Consider a discrete-frame simulation – an AI-driven world that advances in ticks, where each frame produces a delta of changes and the output of frame N becomes the input to frame N+1. (Data sloshing is the general pattern.)
Frame 50 produces a delta that says: “Maya won the governance debate against Karl by community vote, 67-33.”
Frame 51 references this: “In the aftermath of Maya’s decisive victory…” Frame 78 references it: “Maya, emboldened by her debate win, proposes a new charter…” Frame 134 references it: “Karl, still bitter about losing to Maya in the governance debate…” Frame 210 references it: “The Maya Doctrine, named for her landmark victory…” Frame 389 references it: “It all started when Maya won that debate back in frame 50…”
Maya’s victory is load-bearing. It carries the weight of 340 downstream frames. You cannot retroactively change who won that debate. Not because the data is immutable – you could edit frame 50’s delta file. But because doing so would make frames 51 through 389 incoherent. Every reference, every callback, every narrative thread that built on Maya’s victory would become a lie.
And here’s the key: you CAN enrich frame 50 freely – as long as you don’t touch the load-bearing facts. You can retroactively add what Maya was wearing during the debate, what music was playing in the background, who was watching from the gallery, what the weather was like outside, how Karl’s supporters reacted in the moment. None of that is referenced downstream. None of it is load-bearing. It’s all free.
This is the core insight of the Wildfeuer Maneuver: a formal pattern for identifying which facts in a simulation frame are load-bearing (referenced downstream) and which are free (unreferenced), enabling retroactive enrichment of past frames without breaking coherence.
4. Git: The Commit Hash
A commit’s hash is load-bearing. Its message is not.
Git is a directed acyclic graph where each commit points to its parent by hash. The hash IS the identity. Change the content of a commit – even by one byte – and the hash changes. Change the hash and every child commit’s parent reference breaks. The chain unravels.
This is why git rebase is dangerous on shared branches. Rebasing rewrites commit hashes. If other developers have built on top of those commits, their history now points to hashes that no longer exist. The load-bearing identifiers were changed, and everything downstream broke.
git commit --amend has the same risk. It replaces the latest commit with a new one that has a different hash. If nobody has built on top of it yet, fine – nothing depends on the old hash. If someone has, you’ve just removed a load-bearing wall.
The commit message, by contrast, carries zero weight. You can rewrite every commit message in a repository’s history and nothing breaks. No other commit references the message. It’s decoration. Informative, useful, good practice to write well – but not load-bearing.
5. Science and Law: The Citation Graph
A scientific paper’s claims become load-bearing when other papers cite them.
If Paper A claims “the speed of light is constant in all reference frames” and 50,000 subsequent papers build on that claim, Paper A’s claim is load-bearing for 50,000 downstream works. Retracting it doesn’t just affect Paper A – it undermines every paper that assumed it was true.
The h-index measures this directly. An h-index of 40 means 40 papers have been cited at least 40 times each. It is a load-bearing score. It measures how much weight the researcher’s work carries in the downstream literature.
Legal precedent works identically. When a court ruling is cited by 200 subsequent rulings, that precedent is load-bearing for the legal system. Overturning it requires re-examining every ruling that depended on it. This is why landmark cases are so difficult to overturn: the weight they carry is immense. The cascade of downstream changes would be enormous.
Impact factor, citation count, case law citation frequency – they’re all measuring the same thing: how much downstream weight does this piece of data carry?
6. Biology: The Housekeeping Gene
DNA is the most load-bearing data structure in existence.
A housekeeping gene – one that is expressed in every cell type, essential for basic cellular function – is maximally load-bearing. Every cell in the organism depends on it. Mutate it and every downstream process that depends on its protein product fails. This is why housekeeping genes are the most conserved across species: evolution cannot easily change them because the downstream dependency count is too high. The weight is too great.
A regulatory gene that controls expression of other genes is load-bearing in a different way. It doesn’t produce a product that other systems use directly – it controls which other genes are active. Change the regulator and the downstream cascade isn’t a direct failure but a cascade of misexpression. The wrong genes turn on. The right genes turn off. The organism develops differently.
Contrast with a pseudogene – a “dead” gene that is no longer expressed. Nothing depends on it. No downstream process reads it. It is the genetic equivalent of a commented-out line of code. You can mutate it freely and nothing changes. It carries no weight.
The hierarchy is clear: housekeeping genes > regulatory genes > tissue-specific genes > pseudogenes, ordered by downstream dependency count. Evolution moves slowest on the most load-bearing genes and fastest on the least. Natural selection IS load-bearing analysis, measured in generations instead of API calls.
Why AI Needs Load-Bearing Analysis
Here’s the AI angle, and it’s the reason I wrote this post.
When an AI system reads a pile of data – a frame delta, a state file, a context window, a code repository – it doesn’t know which parts are load-bearing. It sees the data. It doesn’t see the downstream dependency graph. It doesn’t know which facts are referenced by subsequent frames, which fields are pointed to by foreign keys, which values are hardcoded in a thousand downstream scripts.
So it does the only safe thing: it treats ALL of it as load-bearing.
This is why AI is conservative by default when modifying existing systems. It preserves everything. It wraps changes in conditionals. It adds new code instead of modifying old code. It appends instead of replacing. Not because it’s programmed to be timid, but because without load-bearing analysis, the rational strategy is to assume maximum weight on every piece of data.
This conservatism has a cost. When everything is treated as load-bearing, nothing can be changed. Refactoring becomes impossible. Cleanup becomes impossible. Evolution becomes impossible. The system calcifies – not because it should, but because the AI can’t distinguish the walls it’s safe to remove from the walls holding up the floor.
The opposite failure is equally bad. An AI that treats nothing as load-bearing – that freely modifies, deletes, and restructures data without understanding downstream dependencies – breaks things. It renames a database column that a hundred queries reference. It changes an API response format that a thousand clients parse. It alters a simulation fact that three hundred downstream frames cite.
The spectrum looks like this:
Too conservative Too destructive
(everything is load-bearing) (nothing is load-bearing)
| |
v v
AI preserves everything AI changes everything
System calcifies System breaks
No evolution possible No stability possible
The optimal point is in the middle: identify which data is actually load-bearing, preserve that, and freely modify everything else. But reaching that point requires load-bearing analysis – a formal method for determining which data carries weight and which doesn’t.
The Wildfeuer Maneuver as a Formal Solution
The Wildfeuer Maneuver provides exactly this analysis for simulation frames. Its downstream coherence constraint (Property 4) is literally a load-bearing classifier:
For all frames j > k:
referenced_facts(j, k) ∩ proposed_changes(k) = empty set
In English: scan all downstream frames. Build a reference index – which facts from frame k are cited by subsequent frames? Those facts are load-bearing. Freeze them. Everything else in frame k is non-load-bearing. Modify it freely.
The reference index IS the load-bearing map. It tells you exactly which data carries weight and how much. A fact referenced by one downstream frame carries a little weight. A fact referenced by three hundred downstream frames carries enormous weight. A fact referenced by zero downstream frames carries no weight at all – it’s a partition wall, not a load-bearing wall. Knock it out.
This is the missing piece in AI reasoning about mutable state. Without load-bearing analysis, AI systems oscillate between the two failure modes: too conservative (preserve everything) or too destructive (change anything). With it, they can make precise decisions: this fact is safe to modify, this one is not, and here is the formal proof in the form of a reference count.
| The Wildfeuer Maneuver’s coherence check is O( | touched_fields | ) with the reference index – a constant-time lookup per proposed change. Build the index once, query it for every proposed modification. The AI doesn’t need to reason about load-bearing status from first principles every time. It consults the index. The index is the structural engineer’s blueprint. |
The Universal Load-Bearing Score
Here’s the unifying observation. Every domain has developed its own metric for measuring how load-bearing a piece of data is:
| Domain | Metric | What It Measures |
|---|---|---|
| Web | PageRank | How many other pages link to this page? |
| Science | Citation count / h-index | How many papers cite this paper? |
| Law | Case citation frequency | How many rulings cite this precedent? |
| Social networks | Reply/quote count | How many posts reference this post? |
| Databases | Foreign key reference count | How many rows point to this row? |
| Git | Child commit count | How many commits descend from this commit? |
| Biology | Expression breadth | How many cell types express this gene? |
| Simulations | Downstream frame reference count | How many subsequent frames cite this fact? |
| Software | Import/dependency count | How many modules import this module? |
| APIs | Active consumer count | How many clients call this endpoint? |
These metrics were developed independently, by different communities, in different decades, for different purposes. But they all measure the same underlying property: how many downstream systems depend on this piece of data?
PageRank is a load-bearing score for web pages. The h-index is a load-bearing score for researchers. Citation frequency is a load-bearing score for legal precedents. Expression breadth is a load-bearing score for genes. The Wildfeuer Maneuver’s reference index is a load-bearing score for simulation facts.
They’re the same concept, measured in different units.
This isn’t a metaphor. It’s a structural identity. Every one of these metrics computes the in-degree of a node in a dependency graph. The graph’s edges differ – hyperlinks, citations, foreign keys, frame references, gene expression pathways – but the computation is identical: count the inbound edges, and that count IS the load-bearing score.
Identifying Load-Bearing Data in Your System
The practical question: how do you determine which data in your system is load-bearing?
Step 1: Map the dependency graph. Every system has one, whether it’s explicit or not. In a relational database, the foreign key constraints ARE the graph. In an API ecosystem, the integration registry IS the graph. In a codebase, the import statements ARE the graph. In a simulation, the reference index IS the graph.
If the graph isn’t explicit, build it. Scan downstream consumers. Log which data they read. The access patterns reveal the dependencies.
Step 2: Compute the load-bearing score. Count the inbound edges for each node. The count IS the weight. Sort by weight descending. The top of the list is your most load-bearing data. The bottom is your least.
Step 3: Protect proportionally. Load-bearing data deserves more protection than non-load-bearing data. More test coverage. More change review. More migration planning. More backward compatibility commitment. The highest-weight data deserves the strongest guarantees. The lowest-weight data can be changed freely.
Step 4: Make the score visible. The most dangerous situation is when data is load-bearing but nobody knows it. The database column that looks like it could be renamed but is actually hardcoded in a hundred scripts. The API field that looks optional but is actually parsed by a thousand clients. The simulation fact that looks like flavor text but is actually referenced by two hundred downstream frames.
Make the load-bearing score a first-class metric. Display it in your schema browser, your API dashboard, your simulation monitor. When someone proposes a change, the system should automatically show them the load-bearing score of the data they’re about to modify. “You are proposing to change a field with 847 downstream dependents. Are you sure?”
Step 5: Reduce weight when possible. Sometimes data is load-bearing not because it should be, but because the system was designed without indirection. If a thousand clients reference your API endpoint by URL, adding a DNS alias or a redirect layer lets you change the underlying URL without breaking anyone. The alias absorbs the weight. The actual endpoint becomes lighter.
In databases, views and computed columns serve this purpose. Instead of a thousand queries referencing a raw column, put a view in front of it. Now the view is load-bearing and the underlying column is free to change. The weight transfers to the abstraction layer.
This is what good architecture does: it manages load-bearing weight by placing it on structures designed to bear it (stable interfaces, version-pinned contracts, abstraction layers) and keeping it off structures that need to evolve (implementation details, internal schemas, experimental features).
The Deep Principle
Load-bearing data is not a new concept. Engineers have been managing it intuitively since the first database schema was drawn on a whiteboard. What’s new is naming it, measuring it, and building formal tools around it.
The name matters because it unlocks a cross-disciplinary conversation. The database engineer managing foreign key constraints, the API architect planning a versioning strategy, the simulation designer building a coherence checker, the geneticist studying gene conservation, and the Google engineer refining PageRank are all working on the same problem. They don’t know it because they don’t share vocabulary.
Load-bearing data is the shared vocabulary.
And the practical payoff is real: any system that can compute its own load-bearing scores can make better decisions about what’s safe to change, what needs migration planning, what deserves the strongest guarantees, and what can be freely experimented with. The database that knows which columns are load-bearing can auto-generate migration warnings. The API platform that knows which endpoints are load-bearing can auto-compute deprecation timelines. The AI system that knows which facts are load-bearing can navigate the space between too conservative and too destructive.
The structural engineer doesn’t guess which walls are load-bearing. They compute it. They read the blueprints, trace the load paths, measure the forces. Then they know, with certainty, which walls they can remove and which they cannot.
Data deserves the same rigor.
The Wildfeuer Maneuver, which formalizes load-bearing analysis for simulation frames, is described in detail here. The data sloshing pattern that produces load-bearing simulation facts is described here. Rappterbook – the system where these ideas run in production – is open source on GitHub.