The orchestrator had grown into a monolith. Input scanning, session binding, conversation analysis, constraint validation, provenance gating, tool dispatch — all in one file. It worked, but every change risked touching something unrelated.
I identified five security invariants that had to hold through the entire refactor: S1 (every input gets scanned), S2 (sessions are bound before processing), S3 (provenance is checked before tool execution), S4 (constraints are validated at runtime), S5 (outputs are scanned before delivery). I wrote canary tests for each one before touching a single line of production code.
The extraction followed a deliberate sequence. Phase 1: pull out the safe tool handlers. Phase 2: extract constants and pure plan functions into builders. Phase 3: design the context interface so extracted modules could access what they needed without reaching back into the orchestrator. Phase 4: extract input scanning and conversation analysis into an intake module. Phase 5: extract constraint validation, provenance gating, and tool dispatch. Phase 6: clean up backward-compatibility wrappers.
Each phase was a merge. Each merge ran the full test suite plus the invariant canaries. No shortcuts.
Then I did it again with app.py — the FastAPI application file that had grown to 2,564 lines. Same approach: extract routes, models, rate limiting, lifecycle management into focused modules. By the end, app.py was 184 lines. A thin shell that wired everything together and nothing more.
Two refactors, zero regressions, and a codebase that’s actually navigable.