All posts

Your agent doesn't need 500 tools. It needs 3.

How Conduit keeps an AI agent's context flat no matter how many MCP servers you connect.

If you use more than one MCP server, you have probably watched your agent get slower and dumber the more you add. It is not your imagination, and it is not the model. It is the tool list.

The hidden tax on every request

The Model Context Protocol works like this: a client connects to a server, calls tools/list, and gets back the full schema for every tool that server offers, name, description, and JSON Schema for the arguments. The client hands all of that to the model so it knows what it can call.

That is fine for one server with five tools. It stops being fine fast.

Connect a GitHub server (a couple dozen tools), a Postgres server, a filesystem server, a browser server, a couple of SaaS connectors, and you are easily looking at two to four hundred tool schemas. Every one of them is injected into the context window on every single request, before the model has read a word of your actual question.

That tax shows up three ways:

  • Worse reasoning. A model choosing between 300 tools makes more mistakes than one choosing between 10. More surface area, more ways to pick wrong.
  • Higher cost and latency. Those schemas are input tokens. You pay for them on every turn, and they push you toward context limits.
  • It gets worse as you grow. The whole point of a tool ecosystem is to add tools. Here, adding value makes the experience worse.

The usual workaround is to manually turn servers on and off depending on what you are doing. That works, in the sense that flossing with a single strand works. Nobody keeps it up.

Conduit is a local gateway that sits between your AI clients and your MCP servers. Every client points at one Conduit instead of configuring servers individually. Because every tool call flows through it, Conduit can change what the agent sees.

In lazy discovery mode, Conduit’s tools/list does not return your 300 tools. It returns three:

  • conduit_status: what servers are connected and roughly what they cover.
  • conduit_search_tools: search across every tool on every connected server.
  • conduit_call_tool: invoke any tool by server and name, with arguments.

That is the entire surface the model loads. Three schemas, flat, forever. It does not matter whether you have connected three servers or thirty.

The agent’s flow becomes:

  1. It needs to do something (“open a PR for this branch”).
  2. It calls conduit_search_tools("create pull request").
  3. Conduit returns the handful of matching tools and their schemas, the GitHub one included, right when they are relevant.
  4. The agent calls conduit_call_tool("github", "create_pull_request", {...}).
  5. Conduit routes that to the real GitHub server and streams the result back.

The full catalog still exists. The gateway holds it, indexes it, and routes to it. The model just pulls in tool definitions on demand instead of swallowing all of them up front. Context stays flat, the model picks from a short, relevant list, and you stop paying the tax.

The tradeoff, honestly

This is not free. Lazy discovery trades one thing for another: instead of every tool being immediately visible, the agent spends one extra step searching before it acts. On a task where the agent already knows exactly which tool it wants, that is a small round trip it did not used to make.

In practice that trade is heavily worth it once you pass a handful of servers, because the cost you remove (hundreds of schemas on every turn) is paid constantly, while the cost you add (a search call) is paid only when the agent actually needs a new capability. And good search makes the agent more capable, not less, because it can find tools across servers it would never have thought to enable by hand.

For small setups where you want the classic behavior, you leave lazy mode off and Conduit advertises the full list like any other server. It is a mode, not a religion.

Why local-first matters here

Conduit runs entirely on your machine. The gateway is a local process your clients spawn; there is no cloud, no account, nothing phones home. Server credentials live in your OS keychain and are injected at call time, never written into a client’s config file. The audit log of what was called, and when, stays on your disk.

That is the right shape for something that sits in the middle of every tool call your agents make. The thing routing your credentials and your prompts should be software you run, not a service you trust.

Try it

Conduit is open source (MIT) and runs on Windows, macOS, and Linux. Set up each server once, point your clients at the gateway, and let the agent search instead of drown.

github.com/tsouth89/conduit

Download Conduit Star on GitHub