Tool Routing

Tally + MCPs

Tool calls have a unique cost profile that standard LLM routing doesn't account for. Tally understands the shape of MCP workloads and routes them accordingly.

New to MCPs? Read the MCP Primer first — it covers what MCP servers are, how they work as bridges between sealed models and your systems, and how tool calls flow. This page covers only Tally's routing behaviour for MCP workloads.

The cost profile

Why MCP calls are expensive,
and where the waste hides.

Token source Impact Notes
Tool schemas +200–1,000 tokens/call Even if no tool is used. Every tool definition in scope costs tokens.
Tool invocation output +100–400 tokens The model's structured tool-call response before the actual tool runs.
Tool result (input to next turn) +500–10,000+ tokens Large for retrieval tools (documents, query results, file contents).
History compounding (multi-round) Multiplies with each round Each additional tool call resends the full prior conversation history.
Final synthesis Normal LLM output cost The final answer — often the smallest cost in a multi-tool interaction.

The key insight: in a complex MCP workflow, the final synthesis step — where the model actually composes the answer — is often the cheapest part. The expensive parts are the retrieval, the tool schemas, and the compounding history. Routing the synthesis to a cheap model while preserving the retrieval quality can cut costs substantially without affecting output quality.

Two routing opportunities

Tool invocation vs synthesis:
they can go to different models.

An MCP interaction involves two cognitively distinct phases, and they don't require the same model capability:

🔧 Tool invocation phase

The model reads the request and tool schemas, decides which tools to call and with what parameters. This requires understanding tool schemas and translating intent into structured function calls.

Models with strong tool-use capability handle this phase better. Not all models support structured tool calls equally well — routing to an incapable model here causes hard failures.

e.g., "Search the database for Q4 sales data and calculate the YoY change" — requires correct tool selection and parameter generation.

📝 Synthesis phase

Once the tool results are back, the model synthesises them into a final response. This is often much simpler than the invocation — if the data is good, the synthesis step is essentially formatting and summarising.

A cheaper model often handles synthesis well, especially when the tool results are clear and well-structured. Routing synthesis to Haiku when invocation used Sonnet can recapture significant cost.

e.g., "Format these database results as a summary table" — straightforward once the data is present.

Tally understands this distinction and can route the two phases independently if your integration separates them. The semantic envelope carries tool_phase: "invocation" or tool_phase: "synthesis" to tell Tally which context it's operating in.

The envelope, extended

MCP-specific shape signals.

The standard semantic envelope works for MCP calls, but Tally recognises additional fields that capture the unique shape of tool-use workloads:

tools_used
Number of MCP tools available in scope. Higher values increase input token cost (tool schemas) and bias toward models with strong tool-use support.
tool_complexity
How complex are the tools being called? simple (read/fetch) → cheap model often fine. complex (multi-step, conditional logic) → stronger model preferred.
expected_tool_rounds
How many tool call rounds are anticipated? 1 (single fetch) vs 3+ (iterative retrieval or multi-step reasoning). High round counts make context compounding significant.
tool_result_density
How large are the tool results expected to be? low (short values), medium (a few paragraphs), high (full documents or large query results). Drives context length estimates.
tool_phase
"invocation" — model decides which tools to call. "synthesis" — model synthesises tool results into a final response. When set, enables phase-specific model selection.
In practice

What Tally recommends for common MCP workloads.

Simple retrieval + format

Pattern
1 tool call, short result
Example
Look up a customer record and format it as JSON
Recommendation
Haiku / Flash

Multi-source research

Pattern
3–5 tool calls, medium results
Example
Query multiple data sources and synthesise findings
Recommendation
Sonnet / GPT-4o

Complex reasoning over retrieved context

Pattern
Large tool results, synthesis is the hard part
Example
Analyse a codebase retrieved via file tools, identify architecture issues
Recommendation
Sonnet / Opus

Iterative tool use (agentic)

Pattern
5+ rounds, dynamic tool selection
Example
Autonomous agent iterating over search + read + write loops
Recommendation
Opus / o1
The non-negotiable

Tally never touches your MCP calls.

👁 A witness. Not a proxy. Not a relay.

Your MCP calls go directly from your application to your MCP server. Tally is never in that path. It does not sit between you and your tools. It does not see the parameters you send to tools. It does not see the results that come back.

What Tally sees: the shape of the call. How many tools are in scope. What phase the call is in. How long the context is. What happened when it was done — success or failure, token count, optional quality signal. That is it.

This is not a privacy stance born of legal caution. It is an architectural commitment: Tally has no business being in your critical path. We give you a recommendation before the call and learn from the outcome after it. Everything in between — your prompts, your tool parameters, your tool results, your data — belongs entirely to you and your providers.

Build the tool. Tally handles the billing.

You built something useful. Tally lets you charge for it — credit system, consumer portal, Stripe, payouts — without you writing a line of billing code. We take 10%. You keep 90% and stay focused on your tools.

Paid Features

MCP Monetisation.

Telemetry and consumer attribution are free, forever. The unique paid feature for MCP providers is monetisation — letting you charge your own consumers for access to your tools, with Tally handling the full billing lifecycle.

Always free

Telemetry & Attribution

Full visibility into who is using your MCP server and how. No caps, no expiry.

  • Full call ledger — every tool invocation, token count, outcome
  • Per-consumer call counts and token volumes
  • Phase-level breakdown — invocation vs synthesis
  • Complete audit trail across multi-agent call chains
  • Consumer identity attribution when Tally-registered callers connect
  • MCP monetisation — paid only

Also available: Routing Recommendations. MCP providers who also want model routing for their own LLM calls can use Tally's standard routing at $0.01 per recommendation. Separate from monetisation — you can use either or both. See Tally + LLMs for the full routing picture.

Route your MCP workloads smarter.

Same API. Full MCP awareness. Start free, scale as you need.

Next section

Pricing

See exactly what we charge — and why it is embarrassingly simple.