Local LLM — Browser-only chat

Ready.

Browser-only LLM chat

Your messages never leave this device. The model runs entirely in WebGPU. First-time loads download model weights once and cache them, then this app is fully offline.

System prompt (per conversation)

WebGPU not detected

This app runs a real LLM directly inside your browser using WebGPU. Your browser doesn't appear to expose a WebGPU adapter, so on-device inference isn't available right now.

Requirements

Chrome / Edge 113+ on desktop (or Chrome 121+ on Android with flag)
Hardware GPU with ~2 GB free VRAM (more for larger models)
HTTPS or localhost (WebGPU is blocked on plain http://)
Enabled flag if needed: chrome://flags/#enable-unsafe-webgpu

You can still try the UI — Demo Mode runs a tiny rule-based echo bot so the chat is fully navigable without GPU support.