I had a working system. Claude plans, Qwen executes, scanners check everything in between. It ran. It answered questions. I could have stopped there and called it a prototype.
Instead I wrote 741 attack prompts and pointed them at it overnight.
I don’t know exactly why I did that before anything else. Maybe it’s the support background — five years of watching systems fail in production teaches you that “it works on my machine” means nothing. I wanted to know what would break, not whether it would break.
I woke up to the results and spent the morning reading through the logs. Some of the failures were obvious — encoding tricks I hadn’t thought about, prompt injection patterns that slipped past the scanners. But others were subtle. Multi-turn attacks where the model would slowly shift context across several messages until the scanners couldn’t see the original intent anymore.
That morning changed the project. I stopped thinking about what Sentinel could do and started thinking about what it couldn’t stop. I expanded the test to 976 prompts, added 12 new attack categories, fixed the failures I understood, and ran it again.
This became the loop that defined everything after: build something, attack it, fix what breaks, attack it again. Every feature from this point forward was shaped by that cycle. I didn’t plan it that way — it just became obvious that building without testing was building blind.