Field Note

Letting instances author their own tools

MCP 2026·05·10

Default arrangement

An instance, in our usage, is a running AI agent: a long-lived process, scoped to a single project or client, that takes work from an operator and ships results back. Each one carries the integrations it needs for its remit and nothing else. The lab runs five in production.

The default arrangement for an instance and its tools is one-way. The developer writes the tools, the instance uses them. The pattern that recurred across our deployments was different. As soon as an instance was running against a real workload, it surfaced gaps in its own tool surface that the developer had not anticipated. Closing those gaps inside a deployment cycle required letting the instance write its own.

Per-instance scaffold

Every instance carries the same baseline. A custom Slack server, exposing the operator-facing surface for that instance. A memory server scoped to the instance's own conversation log. Hooks for context compaction and tool-call logging.

On top of this baseline, instances pick up integrations as missions require them. CCL (the lab's CEO-facing AI partner) carries the integrations for the video work it runs. A client-specific instance carries the integrations for that client's stack. There is no central tool registry. Each instance ships with what its work requires, and nothing else.

Two failure modes

The first instance allowed to author its own tools leaked its system prompt in a Slack message. The leak passed through three layers (the Worker, the Manager, and the per-instance Slack adapter) because none of them was designed to inspect outgoing content for that class of leak.

The second class of regression was budgetary. An instance authoring its own LiteLLM-backed tool call inside a long-running mission burned a day's API budget in two hours.

Authoring tools is fine. Authoring tools without an output guard and a cost ceiling is not. The two together turn a class of bug into a logging problem.

Guardrails

Output guard. A PreToolUse hook running a small Haiku check on every outgoing Slack message, screening for architecture leaks before the channel sees them. Cost is a fraction of message latency. Shipped 2026·03·04. It has caught regressions that would otherwise have landed silently.

Per-tag cost ceilings. Per-tag budgets at $10 per 24 hours across all instance tools, enforced at the LiteLLM gateway. The Manager surfaces the cap before it is hit.

Where we are moving

Replacing Playwright MCP with CLI plus skills. The browser-automation MCP is the most fragile component in the stack. Version drift, headless flakiness, and a permission surface we cannot harden. The replacement wraps the same primitives behind a skill the instance can call directly.

Restructuring service-specific MCPs from capability to outcome. One client integration used to expose 28 tools, one per capability. The instance reliably picked the wrong combination. The same surface now exposes 6 outcomes, each composed of the capabilities it actually needs. The error rate fell on the first day.

The line is not between authoring and not authoring. It is between authoring with the right enclosures and authoring without them.

Contact

If something on this page is relevant to work you are running, write to us. The form is on the landing page. We come back within two working days.

Book a discovery call →