CCX, a runtime for autonomous engineering instances

Constraint

A frontier model is stateless across sessions, cannot independently verify its output against an agreed target, and does not coordinate with sibling processes operating on shared state. These are not training limits. They are scope-of-the-model limits. Production engineering work requires the layer that closes them.

CCX is the runtime that closes them. It is the layer between the model and the work an instance is asked to do. Every agentic system the lab has deployed runs on it.

Architecture

CCX separates two roles a frontier model cannot occupy at once. A long-lived Manager holds the human-facing surface. It hears the operator, dispatches work, monitors progress, and reports back. It never writes code. A short-lived Worker spins up per assignment, runs the work through specialised phases, and never talks to humans. The two processes share state through a durable mission log that survives crashes and context compactions.

This separation is not a stylistic choice. The Manager and Worker roles ask different things of a model. Managing a conversation rewards continuity and high recall of the operator's tone and intent. Executing a unit of work rewards depth, isolation, and the willingness to reject your own output. A single context that tries to do both collapses one into the other. Splitting them keeps both honest.

Recovery is built into the same shape. If the Worker stops responding, the Manager notices and restarts the unit from the last verified checkpoint. The operator does not have to be the watchdog.

RPTIV

Every mission passes through five phases, each owned by a specialised sub-agent. Between phases the Worker decides proceed, redo, or escalate.

R

Research

Directed retrieval against the project's own context and live external sources, scoped to the mission.

P

Plan

Decomposition into verifiable units with explicit dependencies and acceptance criteria carried alongside.

T

Test

Acceptance criteria written before any implementation. Skipping this phase is not a path through the loop.

I

Implement

Code lands against the plan and the tests it produced.

V

Validate

Output is checked against the agreed target. Failures route back to planning.

After RPTIV completes, an 8-point acceptance gate runs before the Worker hands back: git clean, build passes, tests pass, sensible commits, scope match, intent match, visual verification, taste questions flagged. A unit that fails any check does not get marked complete, and the Manager surfaces that the same way it would surface a success.

Safety

Two enclosures run alongside every agent. The first inspects every outgoing message before it reaches the operator, screening for content the agent should not surface. The second meters the agent's draw on upstream APIs so a single long-running unit cannot exhaust a day's budget in an afternoon.

Both are cheap by design. The screening pass uses a small model, runs in milliseconds, and is treated as a layer of the operator-facing surface rather than a wrapper around the work. The budget meter sits at the gateway and surfaces the cap to the operator before it is hit. Neither is an external dependency the agent can route around.

Operator surface

An agent built on CCX does not feel different to the operator on the first interaction. It feels different on the tenth. The agent retains what it was told yesterday. It refuses to mark a unit of work complete if verification has not passed, and reports that in the same channel where it was given the work. The role of the operator shifts from reviewer of artefacts to director of intent.

What runs on it

Every agentic system the lab has deployed runs on CCX. The runtime is what makes a single model behave like a system that can be trusted to ship without supervision. The agents differ in domain (legal, recruitment, video, member operations); the runtime is shared.

Open problems

Verification too narrow. A check passes the immediate unit and misses the parent constraint two steps up the plan. The fix is structural: parent constraints carried into leaf checks rather than reconstructed there.

Cross-agent drift. Two agents on the same project arrive at incompatible plans because their views of project state partitioned at a stale snapshot. This is a coordination problem, not a runtime one. It is what our other release, Grove, addresses.

Recognition

CCX was selected from over 100,000 applications to the Anthropic Built with Opus hackathon, shortlisted to 500 entries, and ranked 4th overall. The work was featured at the Claude Code 1st Birthday event in San Francisco. Notes from the showcase are here.

Contact

If something on this page is relevant to work you are running, write to us. The form is on the landing page. We come back within two working days.

Book a discovery call →