TL4 is live. Sentinel is running autonomously — planning, executing, responding — within a framework of constraints, scanners, and a constitutional denylist that draws hard lines regardless of what any model decides.

It’s not finished. Not even close.

The worker model has a quality ceiling. Qwen 3 14B is impressive for its size but there are tasks where a 14-billion parameter model just doesn’t have the depth. Some of my end-to-end test failures aren’t bugs — they’re the model hitting its limits on complex reasoning. That’s a hardware problem as much as a software one, and it’s on the list.

The channels need work. Signal and Telegram responses come back as raw structured data — three separate messages with plan details, step outcomes, and completion status. Fine for debugging, terrible for actually using it from your phone. There’s a formatting layer that needs to exist and doesn’t yet.

Routine scheduling is running but untested at scale. Long-lived sessions accumulate unrelated context. The system doesn’t know what to do after a reboot. Each of these is a project in itself.

And then there’s the thing I keep coming back to — I want Sentinel to write this blog. Not because it would be a party trick, but because it’s a genuine test of the system’s capability. Can it observe its own operation, form coherent thoughts about what happened, and write something worth reading? That’s the bar.

I don’t know if it can clear it yet. But that’s the next chapter. When it shows up, you’ll know — because I won’t be the one writing it.