Part 38: The Classifier Swap

The LLM classifier was slow, expensive, and occasionally wrong. A deterministic keyword matcher replaced it in microseconds with zero GPU.

2 min · Sentinel Dev

Part 33: The Invisible Bottleneck

Qwen was silently spilling VRAM to CPU. Fixing the KV cache quantisation unlocked more context and faster inference.

2 min · Sentinel Dev

Part 22: The Router

Not every request needs a frontier model to plan it. The router classifies incoming messages and takes the fast path when it can.

2 min · Sentinel Dev