For LLM Consumers

The problem

One model for everything
leaves money on the table.

Most teams pick a capable mid-tier model — Sonnet, GPT-4o, something sensible — and route everything through it. It works. But it's wasteful: your simple tasks are being handled by a model built for hard ones.

The cheap models (Haiku, Flash, GPT-4o mini) are genuinely good at most tasks. The expensive models earn their cost on a minority of calls. The problem is knowing which call is which.

Most teams discover that 60–75% of their real workload is handled at equal quality by their cheapest model. Tally surfaces this with hard data from your actual calls — not estimates.

How it works

Three steps. Nothing moves
except a recommendation.

1

Describe the shape of the call

Before calling your LLM, call tally.route(envelope) with a lightweight description of the task — type, context size, tools in scope, time sensitivity. Your prompt never leaves your app. Tally sees the shape, not the content.

2

Tally returns a recommendation

You get back a model name, a confidence score, and a streaming recommendation. Use it or ignore it — your call. Tally is never in the path to your LLM provider. It makes a suggestion before the call and learns from the outcome after.

3

Report the outcome

After your LLM call, fire tally.telemetry() with the result — model used, token counts, success or failure, optional quality score. This is the signal that keeps the routing getting smarter. Fire-and-forget. Non-blocking.

// 1. Get a recommendation
const rec = await tally.route({
  taskType:            'code',
  structureType:       'multi-step',
  complexityScore:     0.7,
  contextLength:       tokens,
  determinismRequired: false,
});

// 2. Call your LLM provider directly
const model = rec.sampled ? rec.recommended_model : 'claude-haiku-4-5';
const result = await anthropic.messages.create({ model, ... });

// 3. Report the outcome (non-blocking)
tally.telemetry({ model_used: model, tokens_input, tokens_output, success: true });

What you get

Everything the routing engine knows,
delivered per call.

🎯

Model recommendation

The specific model Tally recommends for this call shape, drawn from your configured provider pool.

📊

Confidence score

How confident the bandit is in this recommendation. High confidence = strong signal. Low = still calibrating.

⚡

Streaming recommendation

Whether to stream the response or wait for the full completion — based on task type and time sensitivity.

🛡

Quality floor enforcement

Set a minimum quality threshold. Tally will never recommend a model that has fallen below your bar for this task type.

📈

Continuous improvement

The bandit learns from every telemetry event you send. Routing gets smarter as your call history grows.

🌐

Network signal

Benefit from crowd-sourced patterns across the Tally network — especially useful when you're starting out.

Paid Features

Billing.

Telemetry is free, forever. You pay for routing recommendations.

Always free

Telemetry

Every call recorded. Full log, cost breakdown, quality history. No caps, no expiry.

Full call log — model, tokens, latency, outcome
Cost breakdown per call and rolled up
Quality history — slugs and scores you report back
Export at any time
Routing recommendations — 10% sample on free

Paid

Routing Recommendations

$0.01

per recommendation. One cent.

Recommendation on every call
Confidence score + decision trace
Streaming recommendation
Quality floor enforcement
Bandit learning from your telemetry

Founding pricing — subject to revision.

Free accounts receive a recommendation on 10% of calls — enough to see the value, not enough to rely on it. No credit card required to start.

You're calling LLMs.
Tally picks the right one.

One model for everything
leaves money on the table.

Three steps. Nothing moves
except a recommendation.

Describe the shape of the call

Tally returns a recommendation

Report the outcome

Everything the routing engine knows,
delivered per call.

Model recommendation

Confidence score

Streaming recommendation

Quality floor enforcement

Continuous improvement

Network signal

Billing.

Telemetry

Routing Recommendations

Start routing smarter.

You're calling LLMs.Tally picks the right one.

One model for everythingleaves money on the table.

Three steps. Nothing movesexcept a recommendation.

Describe the shape of the call

Tally returns a recommendation

Report the outcome

Everything the routing engine knows,delivered per call.

Model recommendation

Confidence score

Streaming recommendation

Quality floor enforcement

Continuous improvement

Network signal

Billing.

Telemetry

Routing Recommendations

Start routing smarter.

You're calling LLMs.
Tally picks the right one.

One model for everything
leaves money on the table.

Three steps. Nothing moves
except a recommendation.

Everything the routing engine knows,
delivered per call.