Field notes from running agents in production.
Harness patterns, Claude coverage, and build guides. The part that comes after the demo, where it actually breaks.
Adversarial Verification for AI Agents: Make the Checker Try to Fail
A self-grading agent grades itself green. Here is the harness pattern we run in production: a separate verifier whose job is to break the work, with a rubric that can actually go red.
Read the pattern Claude watchContext Window Budgeting for AI Agents
On a long agent loop, most of your token bill is orientation, not work. Here is how to find it and cut it: cache the stable prefix, cap tool results at the source, and reach for compaction last.
Read the pattern Field notesGit Worktrees for Parallel Agents: The Setup That Actually Holds Up
Running several coding agents at once on one repo sounds great until they stomp each other's files. Here is the worktree layout we run in production, the three failure modes that cost us real time, and the rules that keep it boring.
Read the pattern Build guidesLoop Until Dry: Building a Finder Agent That Knows When It's Done
A forkable build guide for the "loop until dry" pattern: an agent that runs until the work runs out, not until a counter hits zero. The loop is easy. The dedup, the failure handling, and an honest stop condition are where the real work lives.
Read the pattern Harness patternsAgent Pipeline vs Barrier: Where to Put the Gate
A pipeline moves work forward; a barrier stops the work that should not move. The skill is placing each one where it earns its keep, and not on every step.
Read the pattern Field notesEvidence Before Done: Why Your AI Agent's "All Green" Is a Claim, Not a Fact
An agent reporting "done" is generating a belief, not observing a result. Here is the rule we run and the gate that enforces it: no "done" until a check that could have failed actually saw the required behavior work.
Read the pattern