Engineering

Multi-phase agent builds: how to make long AI coding tasks survive

June 4, 2026·8 min read·By the EvolIDE team

EvolIDE blog preview — agent run timeline with Plan, Scaffold, Build, Verify phases; a resume-from-checkpoint card highlighted in cyan with patch and token stats.

Long AI coding runs fail because one rate-limit kills the whole task. Split work into resumable phases with incremental saves and a stall costs minutes, not afternoons. This is the design that makes overnight agent runs actually finish.

The story is familiar. You give an AI agent a multi-file task — “add Stripe billing to this Next.js app, with a tenant table, server actions, a portal page, and admin views.” The agent spends fifteen minutes thinking. It generates a magnificent plan. It starts to apply edits. Then, somewhere around file seven, it hits a rate-limit, the response stream stalls, or the context window overflows. You have nothing usable. The afternoon is gone.

This is not a model problem. It is an orchestration problem. Single-shot agents fail because they treat the whole task as one unbreakable unit. The fix is structural: never let one moment of bad luck cost the whole run.

Why long agent runs fail

Failure modes for a single-shot run cluster into four buckets:

Context overflow — the plan plus the diffs plus the reasoning blows past the model window, so the last steps see a truncated view of the work-in-progress.
Rate-limit or timeout — a single 429 or socket drop, eight files deep, and the half-applied state is lost.
Drift — the model gets confused about which files it has already touched, edits something twice, or contradicts an earlier decision.
Cost cliff — the run is technically succeeding, but the meter is climbing past what the task is worth.

All four go away when the task is sliced into phases.

The fix: phase-aware orchestration

A phase is a self-contained slice of work — small enough to fit in a single model context, large enough to deliver a coherent change. A typical Stripe-portal task might look like this:

Phase 1 · Scaffold — create new routes, types, and middleware shells.
Phase 2 · Gateway wiring — wire the billing service and portal page.
Phase 3 · Polish & smoke — auth gating, error states, and a smoke run.

Each phase has a tight prompt, a clear acceptance criterion, and writes its patches before the next phase begins. If phase 2 fails, phase 1’s changes are already committed and the agent resumes from a known-good checkpoint.

What a “phase” actually is

Under the hood, a phase is four things bound together:

A scoped prompt derived from the original task and the phase’s charter.
An allow-list of files the phase is permitted to touch, so it cannot accidentally edit phase 3’s territory.
An incremental save policy — patches are applied and persisted on disk before the phase exits.
A success signal — either an explicit smoke check or the model’s own verification step.

The agent splits a task into phases automatically based on size, file fan-out, and policy. Small tasks stay as one phase. Large tasks get three or four. Very large tasks get a plan-then-confirm gate before phasing.

Resume from the last good save

When something does go wrong — and on the long tail of tasks, something always does — the run state sits on disk as a compact JSON record. Open EvolIDE the next morning and the agent says:

Found unfinished session “Stripe portal” from yesterday. Phase 1 & 2 complete. Resume from Phase 3?

Click resume and only the failed phase replays. The phases that already shipped patches stay shipped. The model context starts fresh for the new phase, so drift from the failed run does not contaminate the retry. The wallet only debits for the phase that actually re-ran.

Routing a retry to a different model

Resume is also the natural moment to change models. If phase 3 hit context overflow on a small model, the advisor suggests stepping up to a larger one. If phase 2 stalled on a frontier model that is rate-limited, you can route the retry to a cheaper alternative. The phase boundary is the only safe place to make that swap mid-task.

Key takeaways

Single-shot agent runs fail on long tasks because every stall costs the whole run.
Phases are scoped prompts with allow-listed files, incremental saves, and explicit success signals.
Failed phases replay from the last good save; finished phases keep their patches.
Phase boundaries are the only safe place to retry on a different model.
This is the design that makes overnight or background runs actually finish.

Frequently asked

What is a phase, exactly?

A phase is a self-contained slice of a larger task — small enough to fit in a single model context, large enough to deliver a coherent change. Each phase commits its patches incrementally before the next begins.

How does resume work after a failure?

The agent persists chunk state to disk every phase. On re-open, EvolIDE detects the unfinished session, replays only the failed phase, and continues from there.

Can a phase be retried with a different model?

Yes — the advisor can re-route a failed phase to a cheaper or stronger model without restarting the whole task.

Keep reading

EvolIDE blog preview — server-custodied AI keys: OpenAI, Anthropic, and Gemini keys held encrypted on the gateway, with the client holding only a JWT session.

Architecture

Why server-custodied AI keys beat per-laptop secrets

Provider keys on every developer machine is the largest unspoken AI risk. Here's how EvolIDE inverts the model.

Cost

The hidden cost of the wrong AI model: 57+ models, one task

Picking the right model is often the difference between $1 and $10 for the same outcome. The advisor explains why.

EvolIDE blog preview — four local runtimes (Ollama, LM Studio, vLLM, llama.cpp) auto-detected, each connected to localhost with a model badge. Offline mode ready.

Local-first

Local-first AI coding with Ollama, LM Studio, vLLM, and llama.cpp

You don't need an account or a cloud key to use an agent IDE. Here's how the four major local runtimes plug in.