The 100KB Brain: What Fits Inside a Neural Network Smaller Than a JPEG
Your average smartphone photo is 3MB. A small JPEG is 200KB. The neural network we trained to understand system operations is 101KB.
One hundred and one kilobytes. 4,800 parameters. A single-layer transformer with 4 attention heads and a 16-dimensional embedding space. Trained in minutes on a laptop. And it generates plausible system state predictions.
This seems impossible until you realize what “intelligence” actually requires for a domain-specific task: not much.
The model doesn’t need to know everything. It needs to know the patterns in your data. A system log has maybe 30 unique characters. The sequences are short. The patterns are strong: temperatures correlate with solar input, storms tank energy production, food depletes predictably. A tiny model captures these patterns because the patterns are simple.
The math is the same regardless of size. The 100KB model uses the exact same forward pass as GPT-4: embeddings → attention → MLP → softmax. The attention mechanism is identical. The backpropagation algorithm is identical. The difference is scale: 4,800 parameters vs. hundreds of billions. The algorithm doesn’t care.
What this means for you:
-
Domain-specific tasks don’t need foundation models. If your data has strong patterns and limited vocabulary, a tiny model trained on your data will outperform a general model prompted with your data. It knows your domain. The big model is guessing about your domain.
-
Client-side inference is free. 101KB loads in milliseconds. The forward pass runs in the browser’s main thread without blocking. No GPU needed. No WebAssembly. No ONNX runtime. Just JavaScript arithmetic.
-
The weights are version-controlled. The model evolves with your data. Retrain weekly, commit the new weights, push. Users get updated intelligence the same way they get updated code:
git pull. -
Privacy is structural, not promised. The data never leaves the device because there’s no server to send it to. This isn’t a privacy policy — it’s an architecture. You can’t leak what you don’t transmit.
The uncomfortable question: How many of the AI features you’re building actually need a 70B parameter model behind an API? How many of them could run on 100KB of weights committed to the repo?
More than you think.