Tool calls have a unique cost profile that standard LLM routing doesn't account for. Tally understands the shape of MCP workloads and routes them accordingly.
New to MCPs? Read the MCP Primer first — it covers what MCP servers are, how they work as bridges between sealed models and your systems, and how tool calls flow. This page covers only Tally's routing behaviour for MCP workloads.
| Token source | Impact | Notes |
|---|---|---|
| Tool schemas | +200–1,000 tokens/call | Even if no tool is used. Every tool definition in scope costs tokens. |
| Tool invocation output | +100–400 tokens | The model's structured tool-call response before the actual tool runs. |
| Tool result (input to next turn) | +500–10,000+ tokens | Large for retrieval tools (documents, query results, file contents). |
| History compounding (multi-round) | Multiplies with each round | Each additional tool call resends the full prior conversation history. |
| Final synthesis | Normal LLM output cost | The final answer — often the smallest cost in a multi-tool interaction. |
The key insight: in a complex MCP workflow, the final synthesis step — where the model actually composes the answer — is often the cheapest part. The expensive parts are the retrieval, the tool schemas, and the compounding history. Routing the synthesis to a cheap model while preserving the retrieval quality can cut costs substantially without affecting output quality.
An MCP interaction involves two cognitively distinct phases, and they don't require the same model capability:
The model reads the request and tool schemas, decides which tools to call and with what parameters. This requires understanding tool schemas and translating intent into structured function calls.
Models with strong tool-use capability handle this phase better. Not all models support structured tool calls equally well — routing to an incapable model here causes hard failures.
Once the tool results are back, the model synthesises them into a final response. This is often much simpler than the invocation — if the data is good, the synthesis step is essentially formatting and summarising.
A cheaper model often handles synthesis well, especially when the tool results are clear and well-structured. Routing synthesis to Haiku when invocation used Sonnet can recapture significant cost.
Tally understands this distinction and can route the two phases independently
if your integration separates them. The semantic envelope carries
tool_phase: "invocation" or tool_phase: "synthesis"
to tell Tally which context it's operating in.
The standard semantic envelope works for MCP calls, but Tally recognises additional fields that capture the unique shape of tool-use workloads:
simple (read/fetch) → cheap model often fine.
complex (multi-step, conditional logic) → stronger model preferred.
low (short values), medium (a few paragraphs), high (full documents or large query results).
Drives context length estimates.
"invocation" — model decides which tools to call.
"synthesis" — model synthesises tool results into a final response.
When set, enables phase-specific model selection.
Your MCP calls go directly from your application to your MCP server. Tally is never in that path. It does not sit between you and your tools. It does not see the parameters you send to tools. It does not see the results that come back.
What Tally sees: the shape of the call. How many tools are in scope. What phase the call is in. How long the context is. What happened when it was done — success or failure, token count, optional quality signal. That is it.
This is not a privacy stance born of legal caution. It is an architectural commitment: Tally has no business being in your critical path. We give you a recommendation before the call and learn from the outcome after it. Everything in between — your prompts, your tool parameters, your tool results, your data — belongs entirely to you and your providers.
You built something useful. Tally lets you charge for it — credit system, consumer portal, Stripe, payouts — without you writing a line of billing code. We take 10%. You keep 90% and stay focused on your tools.
Telemetry and consumer attribution are free, forever. The unique paid feature for MCP providers is monetisation — letting you charge your own consumers for access to your tools, with Tally handling the full billing lifecycle.
Full visibility into who is using your MCP server and how. No caps, no expiry.
Charge your own consumers for access to your MCP server. Consumers purchase credits through Tally. You set the rate. Tally takes 10% of gross credit purchases — no monthly fee, no setup cost. We only earn when you earn.
Also available: Routing Recommendations. MCP providers who also want model routing for their own LLM calls can use Tally's standard routing at $0.01 per recommendation. Separate from monetisation — you can use either or both. See Tally + LLMs for the full routing picture.
Same API. Full MCP awareness. Start free, scale as you need.