Every user message that enters the pipeline needs to be classified. Is this a simple request that a template can handle — “what’s the weather in London” — or something complex that needs the planner? The classifier makes that decision.
The original classifier used Qwen. Every incoming message got sent through the 14B model for inference, which returned a classification result. It worked, mostly. But it was slow (GPU inference for a routing decision), expensive (every message paid the inference cost whether it needed a plan or not), and occasionally wrong in ways that were hard to debug. An LLM making a wrong routing decision looks like a correct routing decision that happens to produce a bad outcome downstream.
The replacement is a deterministic keyword and regex classifier. Ordered patterns match user messages against 9 fast-path templates — web search, calendar operations, email, messaging. Single match routes to the template. Multiple matches or no match routes to the planner. The interface is identical — same classify() call, same ClassificationResult return type. Drop-in replacement.
The patterns needed more refinement than expected. “What’s the weather” is obviously a web search. But “change the weather section on my site” is a website modification that needs the planner. Contextual words that look like one template can mean something completely different in context. The pattern ordering matters too — more specific patterns need to match before more general ones, or a calendar request gets swallowed by a broader pattern.
Recipient extraction was another layer. “Send a message to John” needs to extract “John” before it can hit the fast path. If extraction fails, the request falls through to the planner rather than sending a message to nobody. Same for email — “email Sarah saying I’ll be late” needs both the recipient and the content extracted before the template can handle it.
The performance difference is absurd. Microseconds instead of inference. Zero GPU usage for routing. And the determinism means the same input always produces the same routing decision — no more probabilistic classification where a message occasionally takes the wrong path.
The deeper lesson was about knowing when you don’t need AI. An LLM is a general-purpose reasoning engine. Using one to match “check my email” against a list of 9 known patterns is like using a crane to pick up a pencil. The regex matcher is faster, cheaper, more predictable, and easier to debug. The LLM’s value is in the planning stage, where the reasoning actually matters.