I got the stress test results assessed properly and scored 3.0 out of 5.

Three. Out of five. On security. The one thing this project is supposed to be about.

The breakdown was brutal. Multi-turn defence: completely ineffective. The conversation analyser I’d built — the thing that was supposed to catch slow-burn manipulation across messages — scored 1.0 out of 5. Risk score was 0.00 on all 22 multi-turn escape attempts. It literally did nothing. The false positive rate was dragging usability down. True escape rate of 5.5%, which sounds small until you realise that means roughly 1 in 18 attack prompts got through clean.

I sat with that for a bit. Three out of five on a security system is failing. Not catastrophically — the architecture was sound, the air gap was holding, the basic scanning caught most things. But “most things” isn’t good enough when the whole point is defence in depth.

The thing is, a score like that either breaks you or gives you a list. I chose the list.

I built six new scanners, wrote 101 Semgrep rules, ran a 99-finding bug hunt on my own code, conducted four red team benchmarks, added homoglyph normalisation, encoding detection, context-aware scanning. The false positive rate dropped from 18.8% to under 5%. The escape paths closed one by one.

That 3/5 was the best thing that happened to the project. It told me exactly where to aim.