There was a problem with the pipeline that bothered me: every single request, no matter how simple, went through the full Claude planning cycle. “What time is it?” got the same treatment as “refactor the authentication module.” That’s expensive and slow.

The router fixes this. Incoming messages hit a Qwen-based classifier first — a lightweight local inference that categorises the request. Simple, templatable requests get routed to a fast path: no planner call, no multi-step plan, just a direct template execution with output scanning. Everything else goes through the full pipeline as before.

The template registry holds the day-one patterns — time, weather, simple lookups. Each template declares whether it has side effects, which feeds into the confirmation gate later. The classifier has a planner fallback — if confidence is low or the request doesn’t match any template, it falls through to Claude automatically.

The scan-first design was non-negotiable. Even on the fast path, every input gets scanned before classification, and every output gets scanned before delivery. The security model doesn’t have shortcuts.

The difference in responsiveness was immediate. Simple queries that used to take several seconds (API call to Claude, plan generation, worker dispatch, scan pipeline) now resolve in under a second. The full pipeline is still there for anything complex, but the majority of routine interactions never need it.

Wiring the router into all the channels — web UI, Signal, Telegram — meant every interface got faster at once. One architectural change, felt everywhere.