One trade-off in a model, three choices in a system.
The curve
Anyone deciding how to use AI meets the same trade-off, and it is usually treated as a law. The more capable the model, the more it costs to run and the less control you keep over the data you send it. Capability sits at one end, low cost and privacy at the other, and the decision is framed as a choice of where on that line to sit and what to give up to sit there.
The trade-off is real, and it is a property of a single model rather than of the system built around it. When one model is chosen to do everything, capability and cost and exposure move together, because one component is carrying all three at once. Spread the work across a system and they come apart, because nothing requires a single model to make every call.
What follows is why that trade-off is real, why a better and cheaper model does not release you from it, and why it stops describing your options the moment you stop choosing a single model and start building a system.
Why the curve is real
The curve is not a pricing artifact or a temporary state of the market. It comes from two independent facts that happen to point the same way.
The first is that capability tracks scale. A more capable model is, in the general case, a larger one, run over more computation, sometimes across several coordinated passes for a single answer. That computation is not free and it is paid on every request. Higher capability therefore costs more to serve, not by accident of how it is priced, but because more work is being done.
The second is that the data has to be where the weights are. Frontier weights run on infrastructure you do not own, so reaching them means sending your input across a boundary into someone else's system. The capability lives there, so the data goes there. Locality and capability pull in opposite directions for a physical reason, not a contractual one.
These are two different mechanisms. One is about how much computation an answer takes. The other is about where the computation happens. They are unrelated, except that both get worse as you move toward the frontier, and that coincidence is what makes them read as a single trade-off. Held at the granularity of one model doing everything, cost and exposure look welded to capability and to each other. They are not. They only travel together because you are looking at one decision instead of the thousands a system actually makes.
Why a better model does not move you off it
Every release is met with the expectation that the trade-off is about to dissolve: the new model is cheaper for what it does and better at what it does, so the old compromise is over. The first half is true. Capability per unit of cost rises with almost every release, and the whole curve lifts with it. The clearest case is the price of intelligence itself, which has fallen steeply as open-weight models have closed the distance to the frontier, with no sign yet of leveling off.
What does not change is that it remains a curve. After the lift, the most capable option available is still the most expensive and the most exposed one, and the cheapest, most private option is still the least capable. A better model is a better menu, not the end of the menu. You move along the line faster than before, and you are still on the line. The exposure axis barely moves at all, because a more capable model does not change the fact that its weights run somewhere you do not control.
So the trade-off is structural to one decision: choosing a single model and applying it uniformly. Everything that makes it feel inescapable is downstream of that one assumption. The way out is not a better point on the curve. It is to stop making the system out of a single point.
Intelligence is one decision among many
A production system is not a model. It is a few thousand model calls, and the difficulty of those calls is not evenly distributed. A small number genuinely require frontier reasoning: the ambiguous judgment, the plan with many interacting constraints, the step where being wrong is expensive. The large majority are routine: classifying, extracting, reformatting, checking a result against a rule, deciding which of three branches to take. Applying a frontier model to all of them pays the top of the curve for work that sits at the bottom of it.
Seen this way, intelligence is not a property to buy once and apply everywhere. It is one decision among many, and most of the others do not need much of it. The discipline that follows is a cascade. A capable model orchestrates: it decides what needs to happen and delegates. Cheaper models do the bulk of the execution. A cheap model guards the output, catching the failures the cheap workers are prone to before they leave the system. The strong model is spent only where its strength changes the outcome.
The gate that governs this is the ratio of quality to cost, not cost on its own. Cheapest-everywhere is a trap, because the savings are real until a tail case arrives, and the tail is exactly where a cheap model fails and where the cost of failing is highest. So a quality floor is held under the workers, and the strong model stays on call for the cases that clear a difficulty threshold. The result is a system whose average cost falls toward the bottom of the curve while its output quality holds near the top.
The reason this is possible, and not a hopeful story about averages, is that the intelligence can be moved off the model and into the system around it. When structure, retrieval, and verification carry the accuracy, the model becomes a component rather than the source of the result. We have measured one of our own systems, the typed-graph memory we call Grove, with its underlying model swapped for a far smaller and cheaper one, and seen almost no loss in accuracy. A component you can substitute that cheaply is a component you can route. The capability is not lost when you stop sending every call to the frontier, because the capability was never only in the model.
A system is a few thousand model calls, and almost none of them need the best one.
Privacy is a placement question
The exposure axis looks even more tightly welded to capability than cost does. To get frontier capability you send your data to a third party, so the only way to keep data private appears to be keeping it local, which costs you the capability. Privacy and intelligence seem to be a straight either-or.
That welding is the same artifact as before, the single-model frame seen from a different side. Exposure is not a fact about which model is most capable. It is a fact about where the data flows and whose boundary it crosses. The capable model and the act of pooling your data into a shared system are two separate things that the default arrangement happens to bundle.
They can be unbundled. The same frontier model can be reached through an account the operator owns, so the call runs under the operator's own terms: its data isolated to that account, not pooled into a shared system, not used to train the next model. The data still travels to where the weights run. What changes is the boundary it crosses and the terms that govern it, capability borrowed under a contract the operator controls rather than data surrendered to live wherever the capability lives.
Privacy, then, is a placement question, the same kind of question as cost. Both look like properties of the model because the simplest way to use a model makes them so. Both turn out to be properties of how the model is placed in a system: cost decided by which calls reach the expensive model, exposure decided by whose account the calls run through. Neither is decided by how capable the model is.
Exposure is a question of where the data goes, not of how good the model is.
The corner the curve says is empty
Set the trade-off out as a plane, capability against cost and exposure. The corner it leaves empty, high capability at low cost and low exposure, is the one no single model can occupy. It is exactly where a system sits.
Routing pulls the cost left, because the expensive model is spent only on the calls that need it. Account isolation pulls the exposure left, because the data runs through a boundary the operator owns. Capability stays high, because the hard calls still reach the strong model and the rest are carried by structure. The corner is empty only on the axis of one decision. On the axis of a system, it is the obvious place to be.
This is the difference between buying intelligence and engineering with it. Buying it means accepting the point on the curve that your budget and your data policy permit. Engineering with it means treating capability, cost, and exposure as three separate decisions, made call by call, instead of one decision made once at the top.
The line
The trade-off is real. It is a property of the model, and no model will escape it, because capability will keep costing computation and that computation will keep running somewhere. What the trade-off is not is a property of the system you build around the model.
Frontier models are necessary and not sufficient. Necessary, because the hard calls genuinely need them. Not sufficient, because a system that reaches for the strongest model on every call, through whatever account is most convenient, inherits the worst of the curve and calls it the cost of doing business. The capability is in the model. The leverage is in where you place it.
Contact
If something on this page is relevant to work you are running, write to us. The form is on the landing page. We come back within two working days.
Book a discovery call →