What this is

A character-level transformer with 2 blocks, 64-dimensional embeddings, single-head causal self-attention, and a 256-dimensional feedforward — 110,000 parameters total. It trains live in your browser on the opening of Pride and Prejudice (or any text you paste in) using Adam with gradient clipping. The matrix math is hand-written JavaScript on Float32Arrays — no ML library. Loss curve drops in real time. Twelve weight matrices update as heatmaps. Attention patterns refresh on the most recent batch. Every 50 steps the model samples a 200-character completion and you watch it evolve from random characters to letter combinations to word fragments to plausible Austen.

Why this is mind-blowing

Most ML demos hide the network behind a button. This one shows you every weight, every gradient, every attention score, every sample. The whole stack — backprop, optimizer state, attention math — runs in plain JavaScript. You can read the source and understand the entire model. Then watch it learn.