One API call before your LLM call. Tally tells you which model to use. You save ~71% on cost without touching quality.
Most teams pick a capable mid-tier model — Sonnet, GPT-4o, something sensible — and route everything through it. It works. But it's wasteful: your simple tasks are being handled by a model built for hard ones.
The cheap models (Haiku, Flash, GPT-4o mini) are genuinely good at most tasks. The expensive models earn their cost on a minority of calls. The problem is knowing which call is which.
Most teams discover that 60–75% of their real workload is handled at equal quality by their cheapest model. Tally surfaces this with hard data from your actual calls — not estimates.
Before calling your LLM, call tally.route(envelope) with a lightweight description of the task — type, context size, tools in scope, time sensitivity. Your prompt never leaves your app. Tally sees the shape, not the content.
You get back a model name, a confidence score, and a streaming recommendation. Use it or ignore it — your call. Tally is never in the path to your LLM provider. It makes a suggestion before the call and learns from the outcome after.
After your LLM call, fire tally.telemetry() with the result — model used, token counts, success or failure, optional quality score. This is the signal that keeps the routing getting smarter. Fire-and-forget. Non-blocking.
// 1. Get a recommendation const rec = await tally.route({ taskType: 'code', structureType: 'multi-step', complexityScore: 0.7, contextLength: tokens, determinismRequired: false, }); // 2. Call your LLM provider directly const model = rec.sampled ? rec.recommended_model : 'claude-haiku-4-5'; const result = await anthropic.messages.create({ model, ... }); // 3. Report the outcome (non-blocking) tally.telemetry({ model_used: model, tokens_input, tokens_output, success: true });
The specific model Tally recommends for this call shape, drawn from your configured provider pool.
How confident the bandit is in this recommendation. High confidence = strong signal. Low = still calibrating.
Whether to stream the response or wait for the full completion — based on task type and time sensitivity.
Set a minimum quality threshold. Tally will never recommend a model that has fallen below your bar for this task type.
The bandit learns from every telemetry event you send. Routing gets smarter as your call history grows.
Benefit from crowd-sourced patterns across the Tally network — especially useful when you're starting out.
Telemetry is free, forever. You pay for routing recommendations.
Every call recorded. Full log, cost breakdown, quality history. No caps, no expiry.
per recommendation. One cent.
Founding pricing — subject to revision.
Free accounts receive a recommendation on 10% of calls — enough to see the value, not enough to rely on it. No credit card required to start.
Free to start. One cent per recommendation when you're ready to go all in.