Performance on Sentinel

Performance on Sentinelhttps://sentinel-blog.cherrypod.org/tags/performance/Recent content in Performance on SentinelSentinelhttps://sentinel-blog.cherrypod.org/images/social-preview.pnghttps://sentinel-blog.cherrypod.org/images/social-preview.pngHugo -- 0.147.0en-gbSat, 07 Feb 2026 00:00:00 +0000Part 38: The Classifier Swaphttps://sentinel-blog.cherrypod.org/posts/38-the-classifier-swap/Sat, 07 Feb 2026 00:00:00 +0000https://sentinel-blog.cherrypod.org/posts/38-the-classifier-swap/The LLM classifier was slow, expensive, and occasionally wrong. A deterministic keyword matcher replaced it in microseconds with zero GPU.Part 33: The Invisible Bottleneckhttps://sentinel-blog.cherrypod.org/posts/33-the-invisible-bottleneck/Mon, 02 Feb 2026 00:00:00 +0000https://sentinel-blog.cherrypod.org/posts/33-the-invisible-bottleneck/Qwen was silently spilling VRAM to CPU. Fixing the KV cache quantisation unlocked more context and faster inference.Part 22: The Routerhttps://sentinel-blog.cherrypod.org/posts/22-the-router/Thu, 22 Jan 2026 00:00:00 +0000https://sentinel-blog.cherrypod.org/posts/22-the-router/Not every request needs a frontier model to plan it. The router classifies incoming messages and takes the fast path when it can.