MCPs are not a library or a pattern. They're a protocol for connecting sealed language models to the real systems your application depends on.
The term "Model Context Protocol" sounds abstract. The implementation is concrete. An MCP server is a running process — a server your application connects to over a defined protocol, the same way it might connect to a database or a REST API.
This matters because it shapes how you think about deployment, reliability, and access control. MCP servers can run locally alongside your application, in a container in your infrastructure, or as remote services reachable over the network. They have a lifecycle. They can be started, stopped, versioned, and monitored. They can crash. They are software you operate, not configuration you set.
Some MCP servers are tiny — a 50-line script exposing a single function. Others are substantial services with auth, rate limiting, and persistence. The protocol is the same in both cases. The complexity lives in what the server does, not in how the LLM talks to it.
MCP was designed by Anthropic and released as an open standard. Any LLM provider can implement it — and increasingly they are. The same MCP server works with Claude, GPT-4o, Gemini, or any model that supports the protocol. You build the tool once, and it's available to any compliant model your application uses.
A language model, by itself, lives in a sealed box. You send it text. It returns text. It has no access to your database, your filesystem, the current time, live web content, or anything else that exists outside the payload you send it. This is not a flaw. It is the design. The model is stateless and isolated by definition.
MCP servers are what you build to bridge that gap. They sit between
the LLM and your real systems, exposing specific capabilities in a format the model
can reason about and invoke. The model doesn't get direct database access —
it gets a tool called query_customers that your MCP server executes
on its behalf, returning only what you choose to surface.
This design gives you control. The MCP server defines exactly what the model can and cannot do. It enforces access control, validates parameters, handles errors, and returns results in a form the model can use. The model has no knowledge of your underlying infrastructure — only the surface the MCP server chooses to expose.
The boundary is a feature. Your MCP server decides what the model can query, what it can write, and what it can never touch. You can give a model read access to your customer table without ever exposing your payment data. The model can only reach what the server explicitly permits.
Every MCP server exposes its capabilities in one or more of three forms. Understanding the difference matters because each has a different token profile and a different interaction pattern.
Functions the model can call. The model sees a schema describing what the tool does and what parameters it accepts. It decides when to invoke it and with what arguments. Results come back as structured data the model can reason over.
Data the model can read directly into its context window. Documents, database rows, file contents, configuration. Resources inflate input token count — they're pulled into the payload before the model processes anything.
Pre-built prompt templates the model can invoke. Server-defined patterns for common tasks — an MCP server might expose a summarise_ticket prompt that wraps a best-practice template your team has tuned.
In practice, most MCP integrations are predominantly tool-based. Resources and prompts are powerful but less commonly used. When people talk about "MCP calls," they almost always mean tool invocations.
The variety of what MCP servers bridge to is part of what makes the standard powerful. Any system that can be described as a set of typed functions is a candidate.
Exposes query, insert, update. The model writes SQL or calls named functions — your server validates, executes, and returns structured results. The model never sees your connection string.
Exposes read_file, list_dir, write_file scoped to a directory. Common in coding agents. The model reads and writes files within the bounds the server permits.
Exposes a search tool backed by a search API. Returns structured results — titles, URLs, snippets — the model can reason over and cite. Keeps live web knowledge in scope.
Exposes send_email, post_message, create_ticket. Lets the model take actions in Slack, Gmail, Jira, or any communication platform your server fronts.
Wraps your own internal REST or GraphQL APIs. The model calls tools like get_order or cancel_subscription without knowing they're API calls underneath.
Exposes run_python, execute_bash, run_tests. The model writes code, submits it, gets back output. Used in agentic coding and data analysis workflows.
One server, any compliant model. Because MCP is an open standard, the database bridge you build today works whether your application routes to Claude, GPT-4o, or any other MCP-compatible model. The bridge doesn't care about the model. The model doesn't care about the bridge's implementation. The protocol handles the handshake.
When a model invokes an MCP tool, the interaction involves more steps — and more tokens — than a plain LLM completion. Understanding this flow explains the cost structure and why routing MCP calls differently from plain completions is worth doing.
System prompt + user message + tool schemas for every MCP tool in scope. Each schema describes a tool's name, purpose, and parameter types. These schemas are tokens — they cost money on every call, even if no tool ends up being used.
The model reads the request and the available tool schemas. It either responds directly (no tool needed) or returns a structured tool-call response — a JSON object describing which tool to invoke and with what arguments. This output is itself tokens you pay for.
Your code receives the tool call, validates it, and sends it to the appropriate MCP server. The server executes the underlying operation — the database query, the API call, the file read — and returns the result to your app.
The bridge executes against the real system. This is the part that has latency, can fail for real reasons (the database is slow, the API is down), and returns data of variable size. Large results — full documents, big query results — become large tokens on the next turn.
The tool result is appended to the conversation history and the full payload is sent back to the LLM. On multi-tool interactions, steps 2–4 repeat — each round compounding the conversation history and the token cost.
With the tool results in context, the model synthesises a final response. In complex agentic workflows, this synthesis step may itself be simpler than the tool invocation — the hard cognitive work was deciding which tools to call and parsing their results.
Steps 3 and 4 are invisible to any LLM provider — including Tally. The actual tool execution happens entirely between your application and the MCP server. Tally sees the shape of the call: how many tools are in scope, what phase it's in, how much context has accumulated. It never sees your tool parameters or results.
You decide what your MCP servers expose. Tally decides which model is best suited to work with them — given the tool count, the complexity, the expected result size, and what you've taught Tally about your workloads over time.
One API key. Instant access to intelligent model routing across all major providers.
Next up
Streaming →