Part 46: Seventy-Eight Findings
A systematic audit of every API endpoint, middleware layer, and frontend component. Seventy-eight findings. Some embarrassing. All fixable.
A systematic audit of every API endpoint, middleware layer, and frontend component. Seventy-eight findings. Some embarrassing. All fixable.
Sentinel was built for one person. Making it work for multiple users meant rethinking auth, isolation, and how the system tracks who’s who.
The injection benchmark found 11 exploits. All shared the same root cause — files in the workspace inherited trusted status regardless of who put them there.
A custom-built injection benchmark with real email, real calendars, real web pages. No simulated backends. 130 tests designed to break the trust architecture.
38 hours, 1,588 probes, zero human intervention. The first comprehensive validation with everything deployed.
The features were built. Now came the hardening — FP reduction, credential scanner expansion, metadata enrichment, and 600 new tests before the big run.
JWT authentication, per-user trust levels, encrypted credentials, and proof that two users can’t see each other’s data.
The third full security audit. 13 batches of fixes, from API hardening to dead code removal.
Row-level security, role separation, and a red team that tried SQL injection, LISTEN/NOTIFY attacks, and privilege escalation through PL/pgSQL.
199 findings across 7 units. 19 fix batches. 7 systemic improvements. The most thorough review the codebase has ever had.
Without a human to override scanners, false positives become functional failures. Risk decay was the fix.
Every sandbox field was snake_case. Podman’s API requires PascalCase. HTTP 201 Created. Zero containment.
Four attack scenarios, including a simulated compromised planner. Six clean runs before trusting it.
Five trust levels, from full human approval to autonomous execution. Each one its own project.
Every shell command runs in a disposable container. No state leaks, no network, no capabilities. Or so I thought.
99 findings from a systematic security audit of my own code. Zero critical — but 16 high-severity.
A security system that scores 3/5 on security is failing. The score became a to-do list.
From a custom PC build to containers, local LLMs, and the paper that started it all — CaMeL.
How a Google Maps feature turned a close protection worker into a privacy obsessive — and laid the foundation for everything that came after.