Tally + MCPs

New to MCPs? Read the MCP Primer first — it covers what MCP servers are, how they work as bridges between sealed models and your systems, and how tool calls flow. This page covers only Tally's routing behaviour for MCP workloads.

The cost profile

Why MCP calls are expensive,
and where the waste hides.

Token source	Impact	Notes
Tool schemas	+200–1,000 tokens/call	Even if no tool is used. Every tool definition in scope costs tokens.
Tool invocation output	+100–400 tokens	The model's structured tool-call response before the actual tool runs.
Tool result (input to next turn)	+500–10,000+ tokens	Large for retrieval tools (documents, query results, file contents).
History compounding (multi-round)	Multiplies with each round	Each additional tool call resends the full prior conversation history.
Final synthesis	Normal LLM output cost	The final answer — often the smallest cost in a multi-tool interaction.

The key insight: in a complex MCP workflow, the final synthesis step — where the model actually composes the answer — is often the cheapest part. The expensive parts are the retrieval, the tool schemas, and the compounding history. Routing the synthesis to a cheap model while preserving the retrieval quality can cut costs substantially without affecting output quality.

Two routing opportunities

Tool invocation vs synthesis:
they can go to different models.

An MCP interaction involves two cognitively distinct phases, and they don't require the same model capability:

🔧 Tool invocation phase

The model reads the request and tool schemas, decides which tools to call and with what parameters. This requires understanding tool schemas and translating intent into structured function calls.

Models with strong tool-use capability handle this phase better. Not all models support structured tool calls equally well — routing to an incapable model here causes hard failures.

e.g., "Search the database for Q4 sales data and calculate the YoY change" — requires correct tool selection and parameter generation.

📝 Synthesis phase

Once the tool results are back, the model synthesises them into a final response. This is often much simpler than the invocation — if the data is good, the synthesis step is essentially formatting and summarising.

A cheaper model often handles synthesis well, especially when the tool results are clear and well-structured. Routing synthesis to Haiku when invocation used Sonnet can recapture significant cost.

e.g., "Format these database results as a summary table" — straightforward once the data is present.

Tally understands this distinction and can route the two phases independently if your integration separates them. The semantic envelope carries tool_phase: "invocation" or tool_phase: "synthesis" to tell Tally which context it's operating in.

The envelope, extended

MCP-specific shape signals.

The standard semantic envelope works for MCP calls, but Tally recognises additional fields that capture the unique shape of tool-use workloads:

tools_used

Number of MCP tools available in scope. Higher values increase input token cost (tool schemas) and bias toward models with strong tool-use support.

tool_complexity

How complex are the tools being called? simple (read/fetch) → cheap model often fine. complex (multi-step, conditional logic) → stronger model preferred.

expected_tool_rounds

How many tool call rounds are anticipated? 1 (single fetch) vs 3+ (iterative retrieval or multi-step reasoning). High round counts make context compounding significant.

tool_result_density

How large are the tool results expected to be? low (short values), medium (a few paragraphs), high (full documents or large query results). Drives context length estimates.

tool_phase

"invocation" — model decides which tools to call. "synthesis" — model synthesises tool results into a final response. When set, enables phase-specific model selection.

In practice

What Tally recommends for common MCP workloads.

Simple retrieval + format

Pattern

1 tool call, short result

Example

Look up a customer record and format it as JSON

Recommendation

Haiku / Flash

Multi-source research

Pattern

3–5 tool calls, medium results

Example

Query multiple data sources and synthesise findings

Recommendation

Sonnet / GPT-4o

Complex reasoning over retrieved context

Pattern

Large tool results, synthesis is the hard part

Example

Analyse a codebase retrieved via file tools, identify architecture issues

Recommendation

Sonnet / Opus

Iterative tool use (agentic)

Pattern

5+ rounds, dynamic tool selection

Example

Autonomous agent iterating over search + read + write loops

Recommendation

Opus / o1

The non-negotiable

Tally never touches your MCP calls.

👁 A witness. Not a proxy. Not a relay.

Your MCP calls go directly from your application to your MCP server. Tally is never in that path. It does not sit between you and your tools. It does not see the parameters you send to tools. It does not see the results that come back.

What Tally sees: the shape of the call. How many tools are in scope. What phase the call is in. How long the context is. What happened when it was done — success or failure, token count, optional quality signal. That is it.

This is not a privacy stance born of legal caution. It is an architectural commitment: Tally has no business being in your critical path. We give you a recommendation before the call and learn from the outcome after it. Everything in between — your prompts, your tool parameters, your tool results, your data — belongs entirely to you and your providers.

Build the tool. Tally handles the billing.

You built something useful. Tally lets you charge for it — credit system, consumer portal, Stripe, payouts — without you writing a line of billing code. We take 10%. You keep 90% and stay focused on your tools.

Paid Features

MCP Monetisation.

Telemetry and consumer attribution are free, forever. The unique paid feature for MCP providers is monetisation — letting you charge your own consumers for access to your tools, with Tally handling the full billing lifecycle.

Always free

Telemetry & Attribution

Full visibility into who is using your MCP server and how. No caps, no expiry.

Full call ledger — every tool invocation, token count, outcome
Per-consumer call counts and token volumes
Phase-level breakdown — invocation vs synthesis
Complete audit trail across multi-agent call chains
Consumer identity attribution when Tally-registered callers connect
MCP monetisation — paid only

Paid — 10% of gross credits sold

MCP Monetisation

Charge your own consumers for access to your MCP server. Consumers purchase credits through Tally. You set the rate. Tally takes 10% of gross credit purchases — no monthly fee, no setup cost. We only earn when you earn.

Credit-based access control for your MCP tools
You set the per-call or per-token rate
Consumer portal — usage dashboard, credit top-ups
Managed billing: Stripe integration, invoicing, payouts to you
Quota enforcement — calls blocked when credits exhausted
Tier support — offer free, pro, and enterprise access levels

Also available: Routing Recommendations. MCP providers who also want model routing for their own LLM calls can use Tally's standard routing at $0.01 per recommendation. Separate from monetisation — you can use either or both. See Tally + LLMs for the full routing picture.

Why MCP calls are expensive,and where the waste hides.

Tool invocation vs synthesis:they can go to different models.

🔧 Tool invocation phase

📝 Synthesis phase

MCP-specific shape signals.

What Tally recommends for common MCP workloads.

Simple retrieval + format

Multi-source research

Complex reasoning over retrieved context

Iterative tool use (agentic)

Tally never touches your MCP calls.

👁 A witness. Not a proxy. Not a relay.

Build the tool. Tally handles the billing.

MCP Monetisation.

Telemetry & Attribution

MCP Monetisation

Route your MCP workloads smarter.

Why MCP calls are expensive,
and where the waste hides.

Tool invocation vs synthesis:
they can go to different models.