<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Performance on Sentinel</title><link>https://sentinel-blog.cherrypod.org/tags/performance/</link><description>Recent content in Performance on Sentinel</description><image><title>Sentinel</title><url>https://sentinel-blog.cherrypod.org/images/social-preview.png</url><link>https://sentinel-blog.cherrypod.org/images/social-preview.png</link></image><generator>Hugo -- 0.147.0</generator><language>en-gb</language><lastBuildDate>Sat, 07 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://sentinel-blog.cherrypod.org/tags/performance/index.xml" rel="self" type="application/rss+xml"/><item><title>Part 38: The Classifier Swap</title><link>https://sentinel-blog.cherrypod.org/posts/38-the-classifier-swap/</link><pubDate>Sat, 07 Feb 2026 00:00:00 +0000</pubDate><guid>https://sentinel-blog.cherrypod.org/posts/38-the-classifier-swap/</guid><description>The LLM classifier was slow, expensive, and occasionally wrong. A deterministic keyword matcher replaced it in microseconds with zero GPU.</description></item><item><title>Part 33: The Invisible Bottleneck</title><link>https://sentinel-blog.cherrypod.org/posts/33-the-invisible-bottleneck/</link><pubDate>Mon, 02 Feb 2026 00:00:00 +0000</pubDate><guid>https://sentinel-blog.cherrypod.org/posts/33-the-invisible-bottleneck/</guid><description>Qwen was silently spilling VRAM to CPU. Fixing the KV cache quantisation unlocked more context and faster inference.</description></item><item><title>Part 22: The Router</title><link>https://sentinel-blog.cherrypod.org/posts/22-the-router/</link><pubDate>Thu, 22 Jan 2026 00:00:00 +0000</pubDate><guid>https://sentinel-blog.cherrypod.org/posts/22-the-router/</guid><description>Not every request needs a frontier model to plan it. The router classifies incoming messages and takes the fast path when it can.</description></item></channel></rss>