Agent — Verification-first AI coding — EvolIDE

The Conductor Agent orchestrates every run from natural-language prompt to merge-ready proof. Each row is a checkable step in the verification-first flow.

#	Step	Owner	Detail
01	Contract	Conductor	Parses goal · constraints · risk · success criteria (G1).
02	Risk classify	Conductor	Low / Medium / High / Critical — drives mode + required checks (G1).
03	Mode pick	ModeRouter	Quick / Smart / Premium Verified / Enterprise Delivery (G9).
04	Plan	Planner	Tool allowlist · scope globs · token budget (G7 Brain).
05	Memory lookup	AttemptMem	FailureMemory pre-empts retrying broken strategies (G4).
06	Sub-agents	Conductor	Spawns Implementer · Verifier · Proof when mode = premium.
07	Tool loop	Implementer	7 guardrails on every call — allowlist, scope, blocklist, dedupe, rate-limit, budget, failure_memory (G13).
08	Hypothesise	Implementer	Required for medium/high risk before any change (G5).
09	Apply patch	Implementer	WorkspaceEdit applier — atomic, journal-backed.
10	Verify	Verifier	11-layer stack — see below.
11	Repair?	RepairLoop	Strategy switcher escalates by cost tier (G7).
12	Proof	Proof	Proof Pack v2 — mergeability score + breakdown (G8).
13	Finish gate	Conductor	finish-requires-verification — blocks claim if score < 0.6.
14	Quality	Dashboard	AgentQualityMetric (G14) records accepted patch rate.
15	Resume?	ResumeCursor	If interrupted, ResumeCursor banner picks up next session (G3).
16	Ledger	RunLedger	Every action persists to AgentRunLedger (G2).

{ "version": 2, "run_id": "run_42", "mergeability_score": 0.87, "mergeability_breakdown": { "tests": { "passed": true, "weight": 0.40 }, "lint": { "passed": true, "weight": 0.20 }, "build": { "passed": true, "weight": 0.20 }, "ai_critic": { "passed": false, "weight": 0.20 } }, "cost_breakdown": { "total_usd": 0.123, "total_tokens": 45000, "by_model": { "openai/gpt-5": { "tokens": 30000, "usd": 0.090 }, "anthropic/claude": { "tokens": 15000, "usd": 0.033 } } }, "contract": { "goal": "fix sum bug", "task_type": "bug_fix", "risk_level": "medium", "success_criteria": [ "all tests pass", "no unrelated files modified", "no \u003csecret\u003e leak in patch" ] }, "attempts": [ { "attempt_index": 1, "strategy": "baseline", "result": "fail", "hypothesis": "wrong operator" }, { "attempt_index": 2, "strategy": "inspect_test", "result": "pass" } ], "verification_summary": { "verdict": "pass", "layers": { "tests": "pass", "lint": "pass" } } }

Capability	EvolIDE	Typical
Verification-first	11 layers run before 'done'	Single test command, often skipped.
Proof Pack	v2 with mergeability score breakdown (G8)	Free-form chat transcript or PR comment.
Failure memory	FailureMemory (G4) blocks retry of broken strategies	Loops on the same failed change.
Multi-agent	Conductor / Implementer / Verifier / Proof (G12)	Single agent or rigid hard-coded fan-out.
Cost router	ModelCostRouter (G10) + BudgetGovernor (G11)	Manual model picker per request.
Resume	ResumeCursor banner (G3) survives crash/restart	Re-prompt and lose the run.
Local-first	Ollama / LM Studio / vLLM / llama.cpp first-class	Cloud-only or token-gated local.

Capability

EvolIDE

Typical

Verification-first

11 layers run before 'done'

Single test command, often skipped.

Proof Pack

v2 with mergeability score breakdown (G8)

Free-form chat transcript or PR comment.

Failure memory

FailureMemory (G4) blocks retry of broken strategies

Loops on the same failed change.

Multi-agent

Conductor / Implementer / Verifier / Proof (G12)

Single agent or rigid hard-coded fan-out.

Cost router

ModelCostRouter (G10) + BudgetGovernor (G11)

Manual model picker per request.

Resume

ResumeCursor banner (G3) survives crash/restart

Re-prompt and lose the run.

Local-first

Ollama / LM Studio / vLLM / llama.cpp first-class

Cloud-only or token-gated local.

The agent doesn't claim 'done' until 11 verifications say so.

The 16-step run sequence

The 11-layer verification stack

1. Scope check

2. File diff diff

3. Unit tests

4. Build

5. Lint / format

6. Security scan

7. UI smoke

8. Acceptance criteria

9. AI critic

10. Human review

11. Observability

Proof Pack v2 — what merges, what doesn't, and why

EvolIDE vs typical AI coding tools

Ship faster with EvolIDE

Request early access