The Basics

MCPs: Servers and Bridges

MCPs are not a library or a pattern. They're a protocol for connecting sealed language models to the real systems your application depends on.

Start here

An MCP is a server,
not a library.

The term "Model Context Protocol" sounds abstract. The implementation is concrete. An MCP server is a running process — a server your application connects to over a defined protocol, the same way it might connect to a database or a REST API.

This matters because it shapes how you think about deployment, reliability, and access control. MCP servers can run locally alongside your application, in a container in your infrastructure, or as remote services reachable over the network. They have a lifecycle. They can be started, stopped, versioned, and monitored. They can crash. They are software you operate, not configuration you set.

Some MCP servers are tiny — a 50-line script exposing a single function. Others are substantial services with auth, rate limiting, and persistence. The protocol is the same in both cases. The complexity lives in what the server does, not in how the LLM talks to it.

The protocol Anthropic made open

MCP was designed by Anthropic and released as an open standard. Any LLM provider can implement it — and increasingly they are. The same MCP server works with Claude, GPT-4o, Gemini, or any model that supports the protocol. You build the tool once, and it's available to any compliant model your application uses.

The core concept

MCPs are bridges between
sealed models and real systems.

A language model, by itself, lives in a sealed box. You send it text. It returns text. It has no access to your database, your filesystem, the current time, live web content, or anything else that exists outside the payload you send it. This is not a flaw. It is the design. The model is stateless and isolated by definition.

MCP servers are what you build to bridge that gap. They sit between the LLM and your real systems, exposing specific capabilities in a format the model can reason about and invoke. The model doesn't get direct database access — it gets a tool called query_customers that your MCP server executes on its behalf, returning only what you choose to surface.

The sealed world

Language Model
Text in. Text out.
No external state.
Bridge
MCP Server
running process

The real world

Your Systems
Databases. APIs.
Files. Services.

This design gives you control. The MCP server defines exactly what the model can and cannot do. It enforces access control, validates parameters, handles errors, and returns results in a form the model can use. The model has no knowledge of your underlying infrastructure — only the surface the MCP server chooses to expose.

The boundary is a feature. Your MCP server decides what the model can query, what it can write, and what it can never touch. You can give a model read access to your customer table without ever exposing your payment data. The model can only reach what the server explicitly permits.

What MCP servers expose

Three kinds of capability:
tools, resources, and prompts.

Every MCP server exposes its capabilities in one or more of three forms. Understanding the difference matters because each has a different token profile and a different interaction pattern.

🔧

Tools

Functions the model can call. The model sees a schema describing what the tool does and what parameters it accepts. It decides when to invoke it and with what arguments. Results come back as structured data the model can reason over.

📄

Resources

Data the model can read directly into its context window. Documents, database rows, file contents, configuration. Resources inflate input token count — they're pulled into the payload before the model processes anything.

📋

Prompts

Pre-built prompt templates the model can invoke. Server-defined patterns for common tasks — an MCP server might expose a summarise_ticket prompt that wraps a best-practice template your team has tuned.

In practice, most MCP integrations are predominantly tool-based. Resources and prompts are powerful but less commonly used. When people talk about "MCP calls," they almost always mean tool invocations.

What bridges look like

MCP servers in the wild.

The variety of what MCP servers bridge to is part of what makes the standard powerful. Any system that can be described as a set of typed functions is a candidate.

Database bridge

Exposes query, insert, update. The model writes SQL or calls named functions — your server validates, executes, and returns structured results. The model never sees your connection string.

Filesystem bridge

Exposes read_file, list_dir, write_file scoped to a directory. Common in coding agents. The model reads and writes files within the bounds the server permits.

Web search bridge

Exposes a search tool backed by a search API. Returns structured results — titles, URLs, snippets — the model can reason over and cite. Keeps live web knowledge in scope.

Communication bridge

Exposes send_email, post_message, create_ticket. Lets the model take actions in Slack, Gmail, Jira, or any communication platform your server fronts.

Internal API bridge

Wraps your own internal REST or GraphQL APIs. The model calls tools like get_order or cancel_subscription without knowing they're API calls underneath.

Code execution bridge

Exposes run_python, execute_bash, run_tests. The model writes code, submits it, gets back output. Used in agentic coding and data analysis workflows.

One server, any compliant model. Because MCP is an open standard, the database bridge you build today works whether your application routes to Claude, GPT-4o, or any other MCP-compatible model. The bridge doesn't care about the model. The model doesn't care about the bridge's implementation. The protocol handles the handshake.

How the pieces move

A tool call, step by step.

When a model invokes an MCP tool, the interaction involves more steps — and more tokens — than a plain LLM completion. Understanding this flow explains the cost structure and why routing MCP calls differently from plain completions is worth doing.

1

Your app sends the initial request

System prompt + user message + tool schemas for every MCP tool in scope. Each schema describes a tool's name, purpose, and parameter types. These schemas are tokens — they cost money on every call, even if no tool ends up being used.

2

The model decides what to do

The model reads the request and the available tool schemas. It either responds directly (no tool needed) or returns a structured tool-call response — a JSON object describing which tool to invoke and with what arguments. This output is itself tokens you pay for.

3

Your app routes the call to the MCP server

Your code receives the tool call, validates it, and sends it to the appropriate MCP server. The server executes the underlying operation — the database query, the API call, the file read — and returns the result to your app.

4

The MCP server does the real work

The bridge executes against the real system. This is the part that has latency, can fail for real reasons (the database is slow, the API is down), and returns data of variable size. Large results — full documents, big query results — become large tokens on the next turn.

5

Your app sends the result back to the model

The tool result is appended to the conversation history and the full payload is sent back to the LLM. On multi-tool interactions, steps 2–4 repeat — each round compounding the conversation history and the token cost.

6

The model generates the final answer

With the tool results in context, the model synthesises a final response. In complex agentic workflows, this synthesis step may itself be simpler than the tool invocation — the hard cognitive work was deciding which tools to call and parsing their results.

Steps 3 and 4 are invisible to any LLM provider — including Tally. The actual tool execution happens entirely between your application and the MCP server. Tally sees the shape of the call: how many tools are in scope, what phase it's in, how much context has accumulated. It never sees your tool parameters or results.

The bridge you build.
The model we route to it.

You decide what your MCP servers expose. Tally decides which model is best suited to work with them — given the tool count, the complexity, the expected result size, and what you've taught Tally about your workloads over time.

Ready to route your MCP workloads?

One API key. Instant access to intelligent model routing across all major providers.

Next up

Streaming