Loop Until Dry: Building a Finder Agent That Knows When It's Done

"Loop until dry" is the pattern you reach for when an agent has a job that is done when there is nothing left to find, not after some fixed number of steps. A crawler that keeps pulling links until the frontier is empty. A migration agent that keeps fixing files until no file fails the check. A research finder that keeps querying sources until a pass turns up zero new hits. You do not know the step count up front, so you stop on a condition, not a counter.

This guide builds one: a "finder" agent that runs in a loop and stops when the work runs dry. It is short, it is forkable, and the parts that bite are the parts most write-ups skip. We run this shape in production at Omni, so the gotchas here are the ones that actually cost us time, not the ones that sound good in a diagram.

The shape

The core is a queue and a loop. The agent pulls an item, works it, and pushes any new items it discovers back onto the queue. The loop ends when a full pass finds nothing new. That last clause is the whole trick: "nothing new," not "the queue is empty right now," because an agent that keeps rediscovering the same items will spin forever.

seen = set()
queue = initial_items()        # seed the frontier
budget = StepBudget(max_steps=200, max_seconds=900)

while queue and not budget.exhausted():
    item = queue.pop()
    if item.key in seen:
        continue
    seen.add(item.key)

    result = agent_step(item)   # the LLM call + tool use
    record(result)

    for found in result.discovered:
        if found.key not in seen:
            queue.push(found)

# dry: the queue drained without the budget firing first

Two exit conditions, and you want both. The queue draining is the clean exit: the work is genuinely done. The budget firing is the safety exit: something went wrong and you would rather stop than burn tokens in a circle. Treat them differently when you report the outcome, because "finished" and "gave up at the limit" are not the same result and a caller needs to know which one happened.

Idempotency is the load-bearing part

The seen set is doing more work than it looks like. Without it, two things break. First, the loop never goes dry, because the agent re-finds items it already worked and re-queues them. Second, you pay for the same step twice, which on a long run is most of your bill.

The mistake is keying seen on the wrong field. If your items are URLs, normalize before you hash: strip the fragment, sort query params, lowercase the host. If they are files, key on the resolved absolute path, not the string the agent handed you, or a symlink and its target will both look new. We lost a chunk of a run once to exactly this: the agent reported the same file under two paths and the loop happily worked it twice before we noticed the duplicate records.

def item_key(item):
    if item.kind == "url":
        u = urlparse(item.value)
        return (u.scheme, u.netloc.lower(), u.path,
                tuple(sorted(parse_qsl(u.query))))
    if item.kind == "file":
        return os.path.realpath(item.value)
    return item.value

Letting the agent decide when it is done

The version above stops when the queue drains. But sometimes "dry" is a judgment, not a count: a research finder is done when another pass would not turn up anything worth recording, and only the agent can really tell. You can give the model that call, with a guardrail.

Have the step return a structured signal alongside its work: whether it found anything new, and a short reason. Then the loop stops after N consecutive passes that all report "nothing new." One empty pass can lie (a timeout, a rate-limited source); a few in a row is a real signal.

dry_streak = 0
DRY_THRESHOLD = 2

while not budget.exhausted():
    result = agent_step(next_batch())
    record(result)
    if result.found_new:
        dry_streak = 0
        enqueue(result.discovered)
    else:
        dry_streak += 1
        if dry_streak >= DRY_THRESHOLD:
            break   # dry by judgment, confirmed across passes

One caution worth stating plainly: a model asked "did you find anything new?" will sometimes say no because it is tired of the task, not because the space is exhausted. That is why the streak counter and the hard budget both stay in. Do not let the model's self-report be the only thing standing between you and an infinite loop, and do not let it be the only thing standing between you and stopping early. The deterministic guards are there precisely because the model's judgment is useful but not reliable enough to be load-bearing on its own.

The failure modes that actually show up

The loop that never dries. Almost always a key collision: two representations of the same item hash differently, so the seen check misses and the item re-enters. Log every re-queue with its key during development. If you see the same logical item with two keys, your normalization is wrong.

The loop that dries too early. A transient failure (a 429, a tool timeout) gets read as "nothing found." Separate "found nothing" from "the step failed." A failed step should retry or re-queue, never count toward the dry streak. This one is quiet: the run looks clean and complete, and you only notice the gap when something downstream is missing.

Runaway cost. The budget is not optional. Even a correct loop can fan out wider than you expect when one item discovers fifty more. Cap total steps and wall-clock time, and log the running count so a misbehaving run shows up in the numbers before it shows up on the invoice.

State lost on a crash. A long finder run that dies at step 180 of 200 and restarts from zero is its own kind of expensive. Persist seen and the queue between steps (a file or a small table is enough) so a restart resumes instead of repeating. Make the step itself safe to re-run, since a crash mid-step means the next start will run it again.

Wiring it to a real agent runtime

The pseudocode above is the control loop. In practice agent_step is a model call that uses tools and returns both its work and the new items it found. The loop here is your outer loop: it owns the queue, the dedup, the budget, and the stop condition. Keep that logic in plain code you control rather than asking the model to manage its own termination across turns, because the model is good at the per-item work and unreliable at remembering "have I already done this one forty steps ago."

If your runtime already gives you a multi-step tool-use loop, you are deciding where the dry check lives. The cleanest split we have found: let the runtime handle a single item end to end (the model reasons, calls tools, returns a result), and keep the queue, the seen set, and the stop condition in your own outer loop around it. That keeps the deterministic parts deterministic and the judgment parts isolated to one place you can inspect.

A checklist before you ship it

Dedup key normalizes every representation of the same item to one value.
Two exit conditions: queue dry (clean) and budget exhausted (safety), reported distinctly.
Failed steps retry or re-queue and never count as "dry."
seen and queue persist across steps so a crash resumes, not restarts.
Step-count and token logging is on, so cost is visible mid-run.
If the model judges "done," a consecutive-dry-pass threshold backs it, and a hard budget backs that.

That is the whole pattern. The loop is easy; the dedup, the failure handling, and the honest stop condition are where the real work is. Build those three right and a "loop until dry" finder runs unattended without lying to you about when it finished.