Part 51: Did It Actually Work?

The planner could execute multi-step tasks. But it had no way to verify its own work. If a step failed silently, it carried on regardless. Time to close the loop.

5 min · Sentinel Dev

Part 44: Did It Actually Work?

Tasks were reporting success based on whether steps completed, not whether the goal was achieved. Building a verification system to tell the difference.

3 min · Sentinel Dev