For LLM Consumers

You're calling LLMs.
Tally picks the right one.

One API call before your LLM call. Tally tells you which model to use. You save ~71% on cost without touching quality.

The problem

One model for everything
leaves money on the table.

Most teams pick a capable mid-tier model — Sonnet, GPT-4o, something sensible — and route everything through it. It works. But it's wasteful: your simple tasks are being handled by a model built for hard ones.

The cheap models (Haiku, Flash, GPT-4o mini) are genuinely good at most tasks. The expensive models earn their cost on a minority of calls. The problem is knowing which call is which.

Most teams discover that 60–75% of their real workload is handled at equal quality by their cheapest model. Tally surfaces this with hard data from your actual calls — not estimates.

How it works

Three steps. Nothing moves
except a recommendation.

1

Describe the shape of the call

Before calling your LLM, call tally.route(envelope) with a lightweight description of the task — type, context size, tools in scope, time sensitivity. Your prompt never leaves your app. Tally sees the shape, not the content.

2

Tally returns a recommendation

You get back a model name, a confidence score, and a streaming recommendation. Use it or ignore it — your call. Tally is never in the path to your LLM provider. It makes a suggestion before the call and learns from the outcome after.

3

Report the outcome

After your LLM call, fire tally.telemetry() with the result — model used, token counts, success or failure, optional quality score. This is the signal that keeps the routing getting smarter. Fire-and-forget. Non-blocking.

// 1. Get a recommendation
const rec = await tally.route({
  taskType:            'code',
  structureType:       'multi-step',
  complexityScore:     0.7,
  contextLength:       tokens,
  determinismRequired: false,
});

// 2. Call your LLM provider directly
const model = rec.sampled ? rec.recommended_model : 'claude-haiku-4-5';
const result = await anthropic.messages.create({ model, ... });

// 3. Report the outcome (non-blocking)
tally.telemetry({ model_used: model, tokens_input, tokens_output, success: true });
What you get

Everything the routing engine knows,
delivered per call.

🎯

Model recommendation

The specific model Tally recommends for this call shape, drawn from your configured provider pool.

📊

Confidence score

How confident the bandit is in this recommendation. High confidence = strong signal. Low = still calibrating.

Streaming recommendation

Whether to stream the response or wait for the full completion — based on task type and time sensitivity.

🛡

Quality floor enforcement

Set a minimum quality threshold. Tally will never recommend a model that has fallen below your bar for this task type.

📈

Continuous improvement

The bandit learns from every telemetry event you send. Routing gets smarter as your call history grows.

🌐

Network signal

Benefit from crowd-sourced patterns across the Tally network — especially useful when you're starting out.

Paid Features

Billing.

Telemetry is free, forever. You pay for routing recommendations.

Always free

Telemetry

Every call recorded. Full log, cost breakdown, quality history. No caps, no expiry.

  • Full call log — model, tokens, latency, outcome
  • Cost breakdown per call and rolled up
  • Quality history — slugs and scores you report back
  • Export at any time
  • Routing recommendations — 10% sample on free

Free accounts receive a recommendation on 10% of calls — enough to see the value, not enough to rely on it. No credit card required to start.

Start routing smarter.

Free to start. One cent per recommendation when you're ready to go all in.