X-ARC

Building effective AI agents: max prompt, min code.

How we build effective AI agents around a model you cannot fully trust, and the science behind each choice. A vendor-neutral method, distilled from running agents in production.

The core bet

The one idea underneath all of it

Maximize the prompt, minimize the code. Put behavior in language and let the model decide; reserve code for the small set of guarantees the model must not be trusted with.

Maximizing the prompt is an engineering conclusion rather than a slogan, because it follows from three measurable properties of capable models. The first is that general methods which scale with computation beat handcrafted ones by a large margin, which is the practitioner reading of Sutton's Bitter Lesson, so a prompt is a general control surface that improves on its own as the model improves while hardcoded logic is handcrafted knowledge the next model outgrows. The second is that in a non-deterministic system code is the high-variance control surface, since minor changes to code cascade into large behavioral changes, whereas prose is the comparatively low-variance and inspectable surface where a behavior change is a paragraph you can read and diff. The third is that attention is a finite budget, so you are allocating the smallest set of high-signal tokens that produce the outcome you want rather than simply writing instructions.

Maximizing the prompt is not granting free will

The common misreading is that maximizing the prompt means letting the model do whatever it wants, and the second half of the bet, minimizing the code, is exactly what prevents it. You partition every decision the system makes by a single question, which is how much variance the outcome can tolerate, so that outcomes which can tolerate none, the ones that are irreversible, financial, cross-tenant, or able to fabricate trust, are guaranteed in code and fail closed, while everything recoverable is delegated to the model. This is bounded autonomy rather than free will.

The model decides
prose-steered · trusted
Code guarantees
fail-closed · model not trusted
Which tools to call, and in what order
A scoped identifier is real-shaped before it touches data
How to interpret data and what to recommend
A citation renders only if it maps to a real retrieved result
What to assume when the user is silent
Identity is bound to the authenticated session, denied on failure
How to recover from an error, how to phrase the answer
Every state-changing action passes a human gate and a schema check

This boundary is the whole framework, because every pattern in the full guide, the harness, the tools, retrieval, memory, approvals, evaluation, and release, is one more application of this single partition.

Inside the guide

The complete field guide works through thirteen patterns, each with its mental model, the concrete production pattern, and the tradeoff. The short version:

01Max prompt, min code. Behavior in prose; code only for the irreversible, financial, cross-tenant, or trust-fabricating.
02Start simple. Workflow before agent; single call before workflow. Add autonomy only where the problem is genuinely open-ended.
03The harness owns the turn; the model owns the loop. One interception seam between the model and the world.
04Tools are contracts. Descriptions and schemas are prompt surfaces. Return errors the model can recover from. Standardize on MCP.
05Retrieval and memory are tools, not prefixes. Inject only the small and always-relevant; fetch everything large or situational.
06Human-in-the-loop pauses the tool call, not the agent. Enforce approval at the wire; auto-deny on timeout.
07Stream everything and make it legible. Typed parts, generative UI, visible tool calls, a cacheable prefix plus a volatile tail.
08Trace every turn, and distrust the cost dashboard. Tokens and behavior are the truth, while dollars are an estimate to reconcile against the real bill.
09Eval by layers. Deterministic contracts, an informational cross-family judge, live upstream probes. Grade outcomes, not paths.
10Ship dark. Default-OFF flags read at call time; rollback is an env flip.
11Fan out deliberately. Multi-agent for breadth-first, read-heavy work only; keep writes and final decisions single-threaded.

The whitepaper carries the full treatment of each, the flow diagrams, a complete reference architecture, and formal citations.

Get the guide

Whitepaper · PDF

Building Effective AI Agents, the complete field guide

The full method, every pattern and its tradeoff, the flow diagrams, the reference architecture, and the references, in X-Arc's research-paper format.

21 pages · vendor-neutral · every pattern run in production
Download PDF

Contact

This guide is published by X-Arc, an applied AI research lab. Every pattern has been run in production, and none of the specifics are proprietary. If you are building an agent product and something here is relevant to work you are running, write to us, the form is on the landing page and we come back within two working days.

Book a discovery call