The agent doesn't claim 'done' until 11 verifications say so.

Every EvolIDE run produces a Proof Pack v2 with a mergeability score. Sub-agents are orchestrated, repairs are bounded, and failure memory blocks repeating broken strategies.

Verification-first launch candidate · internal/private beta. EvolIDE is not publicly production-ready yet. Public production launch is pending final live acceptance tests with real managed-cloud keys and operator sign-off.

The 16-step run sequence

The Conductor Agent orchestrates every run from natural-language prompt to merge-ready proof. Each row is a checkable step in the verification-first flow.

#StepOwnerDetail
01ContractConductorParses goal · constraints · risk · success criteria (G1).
02Risk classifyConductorLow / Medium / High / Critical — drives mode + required checks (G1).
03Mode pickModeRouterQuick / Smart / Premium Verified / Enterprise Delivery (G9).
04PlanPlannerTool allowlist · scope globs · token budget (G7 Brain).
05Memory lookupAttemptMemFailureMemory pre-empts retrying broken strategies (G4).
06Sub-agentsConductorSpawns Implementer · Verifier · Proof when mode = premium.
07Tool loopImplementer7 guardrails on every call — allowlist, scope, blocklist, dedupe, rate-limit, budget, failure_memory (G13).
08HypothesiseImplementerRequired for medium/high risk before any change (G5).
09Apply patchImplementerWorkspaceEdit applier — atomic, journal-backed.
10VerifyVerifier11-layer stack — see below.
11Repair?RepairLoopStrategy switcher escalates by cost tier (G7).
12ProofProofProof Pack v2 — mergeability score + breakdown (G8).
13Finish gateConductorfinish-requires-verification — blocks claim if score < 0.6.
14QualityDashboardAgentQualityMetric (G14) records accepted patch rate.
15Resume?ResumeCursorIf interrupted, ResumeCursor banner picks up next session (G3).
16LedgerRunLedgerEvery action persists to AgentRunLedger (G2).

The 11-layer verification stack

Every required layer (per task type and risk level) must report PASS before the Conductor allows the run to finish. Skipped layers fail closed.

1. Scope check

layer

Diff lives only inside the user-approved scope.

2. File diff diff

layer

Diff parses cleanly and applies to a clean tree.

3. Unit tests

layer

Test command exits 0 (npm test, pytest, etc.).

4. Build

layer

Build/compile/typecheck command exits 0.

5. Lint / format

layer

Lint or formatter exits clean.

6. Security scan

layer

No new high/critical findings (Semgrep / Snyk).

7. UI smoke

layer

Optional Playwright/Storybook for UI tasks.

8. Acceptance criteria

layer

Every contract criterion mapped to an asserted check.

9. AI critic

layer

Second model reviews — no critical issues.

10. Human review

layer

Required for enterprise_delivery; surfaced via PR.

11. Observability

layer

Logs / metrics / traces emitted as expected.

Proof Pack v2 — what merges, what doesn't, and why

{
  "version": 2,
  "run_id": "run_42",
  "mergeability_score": 0.87,
  "mergeability_breakdown": {
    "tests":   { "passed": true,  "weight": 0.40 },
    "lint":    { "passed": true,  "weight": 0.20 },
    "build":   { "passed": true,  "weight": 0.20 },
    "ai_critic": { "passed": false, "weight": 0.20 }
  },
  "cost_breakdown": {
    "total_usd": 0.123,
    "total_tokens": 45000,
    "by_model": {
      "openai/gpt-5":     { "tokens": 30000, "usd": 0.090 },
      "anthropic/claude": { "tokens": 15000, "usd": 0.033 }
    }
  },
  "contract": {
    "goal": "fix sum bug",
    "task_type": "bug_fix",
    "risk_level": "medium",
    "success_criteria": [
      "all tests pass",
      "no unrelated files modified",
      "no \u003csecret\u003e leak in patch"
    ]
  },
  "attempts": [
    { "attempt_index": 1, "strategy": "baseline",      "result": "fail", "hypothesis": "wrong operator" },
    { "attempt_index": 2, "strategy": "inspect_test",  "result": "pass" }
  ],
  "verification_summary": { "verdict": "pass", "layers": { "tests": "pass", "lint": "pass" } }
}

EvolIDE vs typical AI coding tools

CapabilityEvolIDETypical
Verification-first11 layers run before 'done'Single test command, often skipped.
Proof Packv2 with mergeability score breakdown (G8)Free-form chat transcript or PR comment.
Failure memoryFailureMemory (G4) blocks retry of broken strategiesLoops on the same failed change.
Multi-agentConductor / Implementer / Verifier / Proof (G12)Single agent or rigid hard-coded fan-out.
Cost routerModelCostRouter (G10) + BudgetGovernor (G11)Manual model picker per request.
ResumeResumeCursor banner (G3) survives crash/restartRe-prompt and lose the run.
Local-firstOllama / LM Studio / vLLM / llama.cpp first-classCloud-only or token-gated local.

Ship faster with EvolIDE

We’re onboarding a small set of pilot teams before public launch. Tell us about your stack and we’ll get back to you.

Request early access

Be first to try EvolIDE — AI Delivery IDE for serious software teams.

No credit card. Audit-ready governance from day one.