Ready.
Your messages never leave this device. The model runs entirely in WebGPU. First-time loads download model weights once and cache them, then this app is fully offline.
This app runs a real LLM directly inside your browser using WebGPU. Your browser doesn't appear to expose a WebGPU adapter, so on-device inference isn't available right now.
chrome://flags/#enable-unsafe-webgpuYou can still try the UI β Demo Mode runs a tiny rule-based echo bot so the chat is fully navigable without GPU support.