Agent
Every EvolIDE run produces a Proof Pack v2 with a mergeability score. Sub-agents are orchestrated, repairs are bounded, and failure memory blocks repeating broken strategies.
The Conductor Agent orchestrates every run from natural-language prompt to merge-ready proof. Each row is a checkable step in the verification-first flow.
| # | Step | Owner | Detail |
|---|---|---|---|
| 01 | Contract | Conductor | Parses goal · constraints · risk · success criteria (G1). |
| 02 | Risk classify | Conductor | Low / Medium / High / Critical — drives mode + required checks (G1). |
| 03 | Mode pick | ModeRouter | Quick / Smart / Premium Verified / Enterprise Delivery (G9). |
| 04 | Plan | Planner | Tool allowlist · scope globs · token budget (G7 Brain). |
| 05 | Memory lookup | AttemptMem | FailureMemory pre-empts retrying broken strategies (G4). |
| 06 | Sub-agents | Conductor | Spawns Implementer · Verifier · Proof when mode = premium. |
| 07 | Tool loop | Implementer | 7 guardrails on every call — allowlist, scope, blocklist, dedupe, rate-limit, budget, failure_memory (G13). |
| 08 | Hypothesise | Implementer | Required for medium/high risk before any change (G5). |
| 09 | Apply patch | Implementer | WorkspaceEdit applier — atomic, journal-backed. |
| 10 | Verify | Verifier | 11-layer stack — see below. |
| 11 | Repair? | RepairLoop | Strategy switcher escalates by cost tier (G7). |
| 12 | Proof | Proof | Proof Pack v2 — mergeability score + breakdown (G8). |
| 13 | Finish gate | Conductor | finish-requires-verification — blocks claim if score < 0.6. |
| 14 | Quality | Dashboard | AgentQualityMetric (G14) records accepted patch rate. |
| 15 | Resume? | ResumeCursor | If interrupted, ResumeCursor banner picks up next session (G3). |
| 16 | Ledger | RunLedger | Every action persists to AgentRunLedger (G2). |
Every required layer (per task type and risk level) must report PASS before the Conductor allows the run to finish. Skipped layers fail closed.
Diff lives only inside the user-approved scope.
Diff parses cleanly and applies to a clean tree.
Test command exits 0 (npm test, pytest, etc.).
Build/compile/typecheck command exits 0.
Lint or formatter exits clean.
No new high/critical findings (Semgrep / Snyk).
Optional Playwright/Storybook for UI tasks.
Every contract criterion mapped to an asserted check.
Second model reviews — no critical issues.
Required for enterprise_delivery; surfaced via PR.
Logs / metrics / traces emitted as expected.
{
"version": 2,
"run_id": "run_42",
"mergeability_score": 0.87,
"mergeability_breakdown": {
"tests": { "passed": true, "weight": 0.40 },
"lint": { "passed": true, "weight": 0.20 },
"build": { "passed": true, "weight": 0.20 },
"ai_critic": { "passed": false, "weight": 0.20 }
},
"cost_breakdown": {
"total_usd": 0.123,
"total_tokens": 45000,
"by_model": {
"openai/gpt-5": { "tokens": 30000, "usd": 0.090 },
"anthropic/claude": { "tokens": 15000, "usd": 0.033 }
}
},
"contract": {
"goal": "fix sum bug",
"task_type": "bug_fix",
"risk_level": "medium",
"success_criteria": [
"all tests pass",
"no unrelated files modified",
"no \u003csecret\u003e leak in patch"
]
},
"attempts": [
{ "attempt_index": 1, "strategy": "baseline", "result": "fail", "hypothesis": "wrong operator" },
{ "attempt_index": 2, "strategy": "inspect_test", "result": "pass" }
],
"verification_summary": { "verdict": "pass", "layers": { "tests": "pass", "lint": "pass" } }
}| Capability | EvolIDE | Typical |
|---|---|---|
| Verification-first | 11 layers run before 'done' | Single test command, often skipped. |
| Proof Pack | v2 with mergeability score breakdown (G8) | Free-form chat transcript or PR comment. |
| Failure memory | FailureMemory (G4) blocks retry of broken strategies | Loops on the same failed change. |
| Multi-agent | Conductor / Implementer / Verifier / Proof (G12) | Single agent or rigid hard-coded fan-out. |
| Cost router | ModelCostRouter (G10) + BudgetGovernor (G11) | Manual model picker per request. |
| Resume | ResumeCursor banner (G3) survives crash/restart | Re-prompt and lose the run. |
| Local-first | Ollama / LM Studio / vLLM / llama.cpp first-class | Cloud-only or token-gated local. |
Private beta · invite only
We’re onboarding a small set of pilot teams before public launch. Tell us about your stack and we’ll get back to you.
No credit card. Audit-ready governance from day one.