Part 52: The Refactor
The codebase had grown fast. Features landed, bugs got fixed, new capabilities kept shipping. Then a structural audit revealed what that pace had cost: god files, god functions, and a growing maintenance burden.
The codebase had grown fast. Features landed, bugs got fixed, new capabilities kept shipping. Then a structural audit revealed what that pace had cost: god files, god functions, and a growing maintenance burden.
The planner could execute multi-step tasks. But it had no way to verify its own work. If a step failed silently, it carried on regardless. Time to close the loop.
Sentinel could generate code, build websites, write files. But every piece of content was synthetic — generated from scratch by an LLM. What if it could use real photos, real videos, real documents?
Sentinel could browse the web. It could read files and write code. But it couldn’t answer ‘what’s the weather?’ without fabricating something. Time to give it real data backends.
file_patch needs the planner to find a unique anchor string in existing code. The planner is bad at this. What if the system placed named markers instead?
Sentinel was built for one person. Making it work for multiple users meant rethinking auth, isolation, and how the system tracks who’s who.
Tasks were reporting success based on whether steps completed, not whether the goal was achieved. Building a verification system to tell the difference.
The system could remember what happened during tasks, but not what the plan was or whether it worked. Adding plan-outcome memory to close the loop.
Full-file regeneration breaks at scale. The new tool generates only the changed fragments and splices them deterministically.
Tested three planner models on identical tasks. The surprise: upgrading the planner fixed the worker’s bugs.
The LLM classifier was slow, expensive, and occasionally wrong. A deterministic keyword matcher replaced it in microseconds with zero GPU.
Cross-session episodic memory — the system remembers what worked, what failed, and applies that knowledge to future tasks.
LLMs generate broken code. The code fixer catches it before it hits the filesystem — 7 auto-fixers across 10+ languages.
The orchestrator was doing too much. Six phases to extract it into focused modules without breaking a single security invariant.
SQLite to PostgreSQL. Store protocols, async rewrite, data migration, and then ripping out every line of SQLite code.
Not every request needs a frontier model to plan it. The router classifies incoming messages and takes the fast path when it can.
Giving Sentinel a proper UI — dashboard health cards, chat, memory browser, routine management, and a GSP mascot.
Five trust levels, from full human approval to autonomous execution. Each one its own project.
Every shell command runs in a disposable container. No state leaks, no network, no capabilities. Or so I thought.
From a custom PC build to containers, local LLMs, and the paper that started it all — CaMeL.