Engineering

Multi-phase agent builds: how to make long AI coding tasks survive

·8 min read·By the EvolIDE team
EvolIDE blog preview — agent run timeline with Plan, Scaffold, Build, Verify phases; a resume-from-checkpoint card highlighted in cyan with patch and token stats.

Long AI coding runs fail because one rate-limit kills the whole task. Split work into resumable phases with incremental saves and a stall costs minutes, not afternoons. This is the design that makes overnight agent runs actually finish.

The story is familiar. You give an AI agent a multi-file task — “add Stripe billing to this Next.js app, with a tenant table, server actions, a portal page, and admin views.” The agent spends fifteen minutes thinking. It generates a magnificent plan. It starts to apply edits. Then, somewhere around file seven, it hits a rate-limit, the response stream stalls, or the context window overflows. You have nothing usable. The afternoon is gone.

This is not a model problem. It is an orchestration problem. Single-shot agents fail because they treat the whole task as one unbreakable unit. The fix is structural: never let one moment of bad luck cost the whole run.

Why long agent runs fail

Failure modes for a single-shot run cluster into four buckets:

  • Context overflow — the plan plus the diffs plus the reasoning blows past the model window, so the last steps see a truncated view of the work-in-progress.
  • Rate-limit or timeout — a single 429 or socket drop, eight files deep, and the half-applied state is lost.
  • Drift — the model gets confused about which files it has already touched, edits something twice, or contradicts an earlier decision.
  • Cost cliff — the run is technically succeeding, but the meter is climbing past what the task is worth.

All four go away when the task is sliced into phases.

The fix: phase-aware orchestration

A phase is a self-contained slice of work — small enough to fit in a single model context, large enough to deliver a coherent change. A typical Stripe-portal task might look like this:

  1. Phase 1 · Scaffold — create new routes, types, and middleware shells.
  2. Phase 2 · Gateway wiring — wire the billing service and portal page.
  3. Phase 3 · Polish & smoke — auth gating, error states, and a smoke run.

Each phase has a tight prompt, a clear acceptance criterion, and writes its patches before the next phase begins. If phase 2 fails, phase 1’s changes are already committed and the agent resumes from a known-good checkpoint.

What a “phase” actually is

Under the hood, a phase is four things bound together:

  • A scoped prompt derived from the original task and the phase’s charter.
  • An allow-list of files the phase is permitted to touch, so it cannot accidentally edit phase 3’s territory.
  • An incremental save policy — patches are applied and persisted on disk before the phase exits.
  • A success signal — either an explicit smoke check or the model’s own verification step.

The agent splits a task into phases automatically based on size, file fan-out, and policy. Small tasks stay as one phase. Large tasks get three or four. Very large tasks get a plan-then-confirm gate before phasing.

Resume from the last good save

When something does go wrong — and on the long tail of tasks, something always does — the run state sits on disk as a compact JSON record. Open EvolIDE the next morning and the agent says:

Found unfinished session “Stripe portal” from yesterday. Phase 1 & 2 complete. Resume from Phase 3?

Click resume and only the failed phase replays. The phases that already shipped patches stay shipped. The model context starts fresh for the new phase, so drift from the failed run does not contaminate the retry. The wallet only debits for the phase that actually re-ran.

Routing a retry to a different model

Resume is also the natural moment to change models. If phase 3 hit context overflow on a small model, the advisor suggests stepping up to a larger one. If phase 2 stalled on a frontier model that is rate-limited, you can route the retry to a cheaper alternative. The phase boundary is the only safe place to make that swap mid-task.

Key takeaways

  • Single-shot agent runs fail on long tasks because every stall costs the whole run.
  • Phases are scoped prompts with allow-listed files, incremental saves, and explicit success signals.
  • Failed phases replay from the last good save; finished phases keep their patches.
  • Phase boundaries are the only safe place to retry on a different model.
  • This is the design that makes overnight or background runs actually finish.

Related reading: Background agents with isolated worktrees → · EvolIDE workflow →

Frequently asked

What is a phase, exactly?

A phase is a self-contained slice of a larger task — small enough to fit in a single model context, large enough to deliver a coherent change. Each phase commits its patches incrementally before the next begins.

How does resume work after a failure?

The agent persists chunk state to disk every phase. On re-open, EvolIDE detects the unfinished session, replays only the failed phase, and continues from there.

Can a phase be retried with a different model?

Yes — the advisor can re-route a failed phase to a cheaper or stronger model without restarting the whole task.

Ship faster with EvolIDE

We’re onboarding a small set of pilot teams before public launch. Tell us about your stack and we’ll get back to you.

Request early access

Be first to try EvolIDE — AI Delivery IDE for serious software teams.

No credit card. Audit-ready governance from day one.