Architecture

Part 52: The Refactor

The codebase had grown fast. Features landed, bugs got fixed, new capabilities kept shipping. Then a structural audit revealed what that pace had cost: god files, god functions, and a growing maintenance burden.

Part 51: Did It Actually Work?

The planner could execute multi-step tasks. But it had no way to verify its own work. If a step failed silently, it carried on regardless. Time to close the loop.

Part 50: Pictures, Videos, Documents

Sentinel could generate code, build websites, write files. But every piece of content was synthetic — generated from scratch by an LLM. What if it could use real photos, real videos, real documents?

Part 49: Real-World Data

Sentinel could browse the web. It could read files and write code. But it couldn’t answer ‘what’s the weather?’ without fabricating something. Time to give it real data backends.

Part 48: Named Anchors

file_patch needs the planner to find a unique anchor string in existing code. The planner is bad at this. What if the system placed named markers instead?

Part 45: More Than One User

Sentinel was built for one person. Making it work for multiple users meant rethinking auth, isolation, and how the system tracks who’s who.

Part 44: Did It Actually Work?

Tasks were reporting success based on whether steps completed, not whether the goal was achieved. Building a verification system to tell the difference.

Part 43: Learning From Plans

The system could remember what happened during tasks, but not what the plan was or whether it worked. Adding plan-outcome memory to close the loop.

Part 41: file_patch

Full-file regeneration breaks at scale. The new tool generates only the changed fragments and splices them deterministically.

Part 40: The Model Upgrade

Tested three planner models on identical tasks. The surprise: upgrading the planner fixed the worker’s bugs.

Part 38: The Classifier Swap

The LLM classifier was slow, expensive, and occasionally wrong. A deterministic keyword matcher replaced it in microseconds with zero GPU.

Part 29: Learning From Experience

Cross-session episodic memory — the system remembers what worked, what failed, and applies that knowledge to future tasks.

Part 25: The Code Fixer

LLMs generate broken code. The code fixer catches it before it hits the filesystem — 7 auto-fixers across 10+ languages.

Part 24: Breaking Up the Monolith

The orchestrator was doing too much. Six phases to extract it into focused modules without breaking a single security invariant.

Part 23: The Database Migration

SQLite to PostgreSQL. Store protocols, async rewrite, data migration, and then ripping out every line of SQLite code.

Part 22: The Router

Not every request needs a frontier model to plan it. The router classifies incoming messages and takes the fast path when it can.

Part 20: The Interface

Giving Sentinel a proper UI — dashboard health cards, chat, memory browser, routine management, and a GSP mascot.

Part 12: The Trust Ladder

Five trust levels, from full human approval to autonomous execution. Each one its own project.

Part 10: The Sandbox

Every shell command runs in a disposable container. No state leaks, no network, no capabilities. Or so I thought.

Part 3: Why Sentinel Exists

From a custom PC build to containers, local LLMs, and the paper that started it all — CaMeL.