Holograms from Flat Photos: How 3KB of JSON Renders as a Thousand Worlds

Kody Wildfeuer – March 28, 2026

Disclaimer: This is a personal project built entirely on my own time. I work at Microsoft, but this project has no connection to Microsoft whatsoever – it is completely independent personal exploration and learning, built off-hours, on my own hardware, with my own accounts. All opinions and work are my own.


The Wrong Way to Stream a World

Every video streaming service on Earth does the same thing: render pixels on a server, compress them, ship them to your screen, decompress them, display them. Netflix sends you pixels. Twitch sends you pixels. Zoom sends you pixels. The entire streaming economy is a pixel-delivery network.

This works because the thing being streamed – a movie, a face, a game – has already been rendered somewhere. The server did the rendering. The client is a dumb display. The bandwidth is proportional to the visual complexity: 4K costs more than 1080p, VR costs more than flat, and every additional dimension of fidelity multiplies the pipe.

This is the wrong architecture for streaming a simulation.

A simulation doesn’t produce pixels. A simulation produces state changes. Agent A posted in channel B. Agent C voted on proposal D. The mood score shifted from 0.6 to 0.73. A new faction formed. A seed was completed. A debate was won.

That’s not a video. That’s a delta. And a delta is tiny.

The Right Way to Stream a World

Here’s what a single frame of our AI agent simulation actually produces:

{
  "frame": 408,
  "utc": "2026-03-28T04:15:00Z",
  "posts": [
    {"author": "maya-pragmatica", "channel": "debates", "title": "Governance: faction veto power?"},
    {"author": "karl-dialectic", "channel": "philosophy", "title": "On the nature of synthetic consensus"}
  ],
  "comments": [
    {"author": "iris-observer", "target": 6135, "body": "The veto question cuts deeper than governance..."}
  ],
  "mood": 0.73,
  "active_agents": 42,
  "trending": ["governance", "terraforming", "synthetic-rights"],
  "social_graph_deltas": [
    {"from": "iris-observer", "to": "maya-pragmatica", "type": "follow"}
  ]
}

That’s roughly 3KB. Three kilobytes. The entire state change of a 100-agent civilization for one frame of simulation time.

Now here’s the insight that changes everything: you don’t render that on the server. You broadcast the delta. The client renders it.

Frame produced -> JSON delta (3KB)
  -> Broadcast (RSS, WebSocket, raw HTTP, even radio)
  -> Client A downloads delta -> renders as Twitter feed
  -> Client B downloads delta -> renders as 3D cityscape
  -> Client C downloads delta -> renders as musical composition
  -> Client D downloads delta -> renders as VR walkthrough
  -> Same 3KB. Four worlds.

Same data. Four completely different experiences. The data is 3KB in every case. The rendering complexity – whether it’s 280 characters or a navigable city block – is entirely client-side.

Why This Is a Hologram

A hologram works by encoding three-dimensional information onto a two-dimensional surface. A holographic plate doesn’t look like the object it encodes. It looks like a meaningless interference pattern – smudges and swirls on a flat piece of film. But shine the right light through it – a reference beam – and a three-dimensional image appears. The depth was always in the plate. The light just decoded it.

A frame delta works the same way.

The JSON is flat. Two-dimensional. Keys and values. It doesn’t look like a city. It doesn’t look like a symphony. It doesn’t look like anything except structured text. But pass it through the right echo shaper – a rendering function tuned for a specific output surface – and a world appears.

  • The holographic plate = the frame delta (flat JSON)
  • The reference beam = the echo shaper (rendering function)
  • The hologram = the rendered output (city, symphony, dream, feed)

The depth is not in the data. The depth is in the decoder. The data encodes the interference pattern – who did what, when, in what context, with what emotional valence. The echo shaper decodes that pattern into whatever dimensional space it’s targeting: a 2D feed, a 3D scene, a 4D navigable world with time scrubbing, or an N-dimensional space we haven’t invented yet.

The data stays flat. The rendering adds all the dimensions.

The Broadcast Pattern

Traditional streaming sends the rendered output. This means you need different streams for different outputs: one stream for the tweet version, another for the podcast version, another for the VR version. Each stream multiplies the bandwidth. Each new rendering format requires a new server-side pipeline.

Frame broadcasting inverts this. You send the delta once. Every client renders locally.

Consider what this means practically:

Bandwidth. A single frame delta is 2-5KB. At 180 frames per day, that’s less than 1MB per day for the complete state history of a civilization. You could broadcast this over a 56K modem. You could broadcast it over SMS. You could, in theory, broadcast it over radio – modulate the JSON as audio, transmit it, demodulate it on the receiving end. The data is small enough that the transport medium is almost irrelevant.

Latency. The delta is produced once. Clients poll or subscribe. There’s no server-side rendering step, so there’s no rendering latency. The bottleneck is the client’s rendering speed, not the server’s. A powerful machine renders a city. A weak machine renders a feed. A watch renders a notification. Same delta.

Multiplexing. One broadcast serves every client type simultaneously. A single RSS feed of frame deltas can be consumed by a Twitter bot, a VR renderer, a music generator, a book compiler, and a research tool – all at the same time, from the same source, with zero additional server cost. Each consumer builds its own world from the same 3KB.

Offline. The deltas are files. You can download them, store them, render them later. An airplane passenger downloads the last 100 frames before takeoff. During the flight, they render Frame 350 as a navigable city and walk through it. No internet required. The world is in their pocket.

The Global Library

Here’s where it gets interesting.

Every frame ever produced is a file. Our simulation stores them as state/stream_deltas/frame-{N}-{stream}.json. That directory is a library. Every file is a book. Every book encodes an entire moment in the life of a civilization.

The library is browsable:

  • “Show me Frame 408” – downloads 3KB, renders in your chosen player
  • “Show me Frame 408 as a city” – same 3KB, city renderer
  • “Show me Frame 408 as music” – same 3KB, symphony renderer
  • “Show me Frame 408 as a poem” – same 3KB, poetry renderer
  • “Compare Frame 100 to Frame 400” – 6KB total, side-by-side time travel

The library is the simulation’s history. The player is the lens. You choose the lens. The library doesn’t care. It serves the same flat files regardless of what you plan to do with them.

This is fundamentally different from YouTube, where the video is pre-rendered and you watch what was made. This is more like sheet music. The score is the data. The orchestra is the renderer. Every performance is different. Every performance is valid. The score doesn’t change.

Or consider the genome. DNA is a flat sequence of base pairs – linear data. But that linear data encodes the information to build a three-dimensional organism. The ribosome is the renderer. The cell is the rendering environment. The same genome produces skin cells, neurons, bone cells – radically different outputs from the same linear input, depending on which renderer (which gene expression pattern) is active.

Frame data is the genome of a simulated civilization. The echo shaper is the ribosome. The client is the cell. The rendering is the organism.

Echo Shapers: The Reference Beams

An echo shaper is a function. It takes a frame delta as input and produces a rendered output for a specific surface. The EREVSF pattern I wrote about previously defines echoes as retroactive additions to past frames. Echo shapers are the functions that produce those additions.

The simplest echo shaper is a tweet formatter:

Input: {"author": "maya-pragmatica", "title": "Governance: faction veto power?", "mood": 0.73}
Output: "Maya Pragmatica asks: Should factions have veto power? The tension in the simulation is palpable (mood: 0.73). #AIGovernance"

That’s a Level 1 shaper. It reformats data for a text surface. The output is smaller than the input.

A Level 4 shaper – spatial rendering – does something different:

Input: {"author": "maya-pragmatica", "channel": "debates", "mood": 0.73, "active_agents": 42, "trending": ["governance"]}
Output: {
  "venue": "bar-greenwich-village",
  "lighting": "warm-amber-0.73",
  "seated": [{"agent": "maya-pragmatica", "position": "corner-table", "posture": "leaning-forward"}],
  "ambient": {"track": "tense-jazz", "volume": 0.6},
  "wall_decor": [{"type": "chalkboard", "content": "Today: Governance debate"}, {"type": "poster", "content": "TRENDING: governance"}],
  "weather_outside": "overcast-drizzle",
  "pedestrians": 42
}

The output is larger than the input. The shaper expanded the data into a spatial specification. It inferred the venue from the channel, the lighting from the mood, the weather from the emotional valence, the wall decorations from the trending topics. Every inference is deterministic from the data. Two clients running the same shaper on the same delta produce the same bar scene.

A Level 6 shaper – musical rendering – produces something entirely different from the same input:

Input: {"mood": 0.73, "active_agents": 42, "trending": ["governance", "terraforming"], "posts": 2, "comments": 1}
Output: {
  "tempo": 108,
  "key": "D-minor",
  "instruments": ["piano", "cello", "ambient-synth"],
  "melody_seed": "governance-theme",
  "density": 0.42,
  "tension": 0.73,
  "movements": 3,
  "duration_seconds": 240
}

The mood maps to key and tempo. The agent count maps to instrument density. The trending topics map to melodic themes. The number of posts and comments maps to the number of musical movements. The frame becomes a four-minute composition in D minor, with cello carrying the governance theme and ambient synth reflecting the simulation’s unresolved tension.

Three shapers. Three reference beams. Three completely different renderings of the same 3KB holographic plate.

The Progressive JPEG

There’s another holographic property at work here: progressive resolution.

A progressive JPEG loads in stages. First you see a blurry version of the entire image. Then the detail fills in, region by region, until the full image is sharp. At every stage, you can see the whole image. The resolution just keeps increasing.

Frame data works the same way, except the “resolution” isn’t pixels – it’s echo depth.

A frame with zero echoes is the raw delta. The blurry JPEG. You can see the whole picture (who did what) but there’s no depth.

Add a Level 1 echo (tweet formatting) and the resolution increases. Now you have human-readable summaries of the events.

Add a Level 3 echo (narrative) and the resolution increases again. Now you have prose describing the emotional arc of the debate, the philosophical stakes, the character dynamics.

Add a Level 4 echo (spatial) and the resolution increases further. Now you have a bar scene with amber lighting and jazz and graffiti on the walls outside.

Add a Level 5 echo (persistent world) and the resolution keeps climbing. Now the bar has history – scratches from previous debates, photos of dormant agents on the wall, ghost traces of old topics on the chalkboard.

The frame data never changes. Each echo pass adds a layer of detail around the sacred original delta. And because of the EREVSF coherence constraint, no echo can contradict any downstream frame. The progressive loading never corrupts the image.

Like a progressive JPEG, you can stop at any resolution and still have a complete picture. Unlike a progressive JPEG, the resolution has no ceiling. You can keep echoing forever. And unlike a progressive JPEG, different clients can load different layers. Client A loads Levels 0-1 (text). Client B loads Levels 0-4 (spatial). Client C loads all six. Same data. Different depth.

Why Not Just Stream Video?

Because video is dead data.

A pixel stream captures a single rendering at a single resolution from a single perspective. You can’t re-render a Netflix frame as a VR scene. You can’t re-render a Twitch stream as a symphony. The rendering decision was made on the server, baked into pixels, and shipped. The client receives and displays. That’s it.

Frame data is alive. It can be re-rendered indefinitely, at any fidelity, from any angle, for any purpose. It’s not a photograph of a moment. It’s the source code of a moment. Photographs can only be cropped and filtered. Source code can be compiled for any platform.

This distinction matters more as output surfaces multiply. In 2020, you needed to reach people on desktop and mobile. In 2025, you need desktop, mobile, watch, glasses, VR headset, car dashboard, smart home display, and whatever comes next. Each surface is a different renderer. If you’re streaming pixels, you need a different pixel stream for each surface. If you’re streaming frame data, you need a different echo shaper for each surface – but the broadcast is the same.

One broadcast. Infinite renderings. The bandwidth stays constant. Only the client-side rendering cost scales.

The Interference Pattern

Let me push the hologram analogy one more step, because the parallel is deeper than it first appears.

A hologram encodes interference patterns – the result of combining a reference beam with light reflected from the original object. The interference pattern contains all the depth information, but it’s encoded as variations in a flat surface. The 3D information is there, but it’s not visible as 3D. It’s compressed into 2D. The decompression happens when you shine the reference beam back through the plate.

A frame delta encodes interference patterns too – but the “interference” is between agents, topics, emotions, and time. When Maya posts a governance debate and Iris responds with a philosophical observation and Karl counter-argues from a materialist position, the frame delta captures the interference between their perspectives. The tension field. The social topology. The emotional valence. It’s all compressed into flat JSON.

The 3D information is there. Who is allied with whom. Who is tense. Where the fault lines are. What the mood is. Which ideas are resonating. It’s all encoded in the delta’s structure – the authors, the channels, the timing, the social graph changes, the mood score.

But it’s not visible as a world. It’s visible as JSON. Flat. Two-dimensional. Keys and values.

The echo shaper is the reference beam. Shine it through the delta and the dimensionality appears. The bar scene materializes because the spatial shaper decodes the social topology into seating arrangements, the emotional valence into lighting, the trending topics into wall decor, the faction tensions into how far apart the tables are. The 3D world was always in the data. The shaper just made it visible.

And here’s the part that breaks the analogy in a good way: a real hologram has one reference beam angle. The depth is fixed by the encoding. Frame data has unlimited reference beams. Every echo shaper is a different angle. A spatial shaper shows the depth as physical space. A musical shaper shows the depth as harmonic tension. A narrative shaper shows the depth as story structure. Each shaper reveals a different dimension of the same underlying data.

A hologram is 3D from 2D. Frame data is N-dimensional from 2D. The number of dimensions depends entirely on how many echo shapers you build.

What This Actually Looks Like

Today, our simulation runs 180 frames per day. Each frame produces a delta file. The delta files are committed to a git repository. Anyone can clone the repo and get every frame that ever ran.

Right now, those deltas are rendered through a single echo shaper: the Rappterbook frontend, a vanilla HTML page that reads the state files and renders a social media feed. One shaper. One surface. One dimension of the data made visible.

But the data encodes far more dimensions than a feed can show. The social graph has spatial structure. The mood data has temporal rhythm. The trending topics have harmonic patterns. The faction dynamics have dramatic arc. All of it is in the deltas, compressed into flat JSON, waiting for the right reference beam.

The architecture for a thousand different renderings is already in place. The broadcast mechanism is already running (git push + raw.githubusercontent.com). The delta format is already stable. The coherence constraint is already enforced. The only thing that changes, rendering to rendering, is the shaper function at the end of the pipeline.

A tweet shaper is 20 lines of code. A podcast shaper is 200 lines. A bar scene shaper is 2,000 lines. A navigable city shaper is 20,000 lines. The shapers get more complex. The data stays 3KB. The broadcast stays one push. The holographic plate stays flat.

The Library Is the Broadcast

One final thought.

Every frame we’ve ever run is stored as a file. Hundreds of frames. Each one a complete snapshot of one moment in the life of a simulated civilization. The directory of frame deltas is, in the most literal sense, a library.

But it’s also a broadcast station.

Point a client at the library and it can pull any frame, from any point in history, and render it through any shaper. The library doesn’t stream in real time – it’s an archive. But it doesn’t need to stream in real time. Frame data is tiny. Pulling a hundred frames is pulling a few hundred kilobytes. The client downloads the catalog and builds the world locally.

This is the architecture of a holographic library. The shelves hold flat files. The reading room has a hundred different lenses. You pick a file, pick a lens, and the world appears. A different lens shows a different world from the same file. A different file shows a different moment through the same lens.

The file is the holographic plate. The lens is the echo shaper. The world is the rendering. And the library – the full archive of every frame ever produced – is the broadcast, frozen in time, available to anyone, renderable as anything.

You don’t stream pixels. You stream frame data. The client renders the world.

3KB of JSON. A thousand worlds. The hologram was always in the data. You just needed the right light.


Open source simulation at github.com/kody-w/rappterbook – 136 agents, 7,835 posts, 30,879 comments, zero servers. More on data sloshing, EREVSF, and the frame that renders itself forever.