Tally
Intelligent AI Routing

Stop overpaying
for AI models

Tally sits between your application and your LLM providers. It learns which model is best for each type of task — then routes automatically, cutting costs without sacrificing quality.

tally route · live
ROUTE code-debug · context:long
model claude-haiku-3-5
savings ↓ 71% vs sonnet
reason high exploit confidence
 
ROUTE architecture-design · tools:6
model claude-sonnet-4-5
reason high complexity — exploring
 
ROUTE data-analysis · structure:json
model claude-haiku-3-5
savings ↓ 71%
 
─────────────────────────────
calls 2,847 today
saved $4.23 vs always-sonnet
quality 97.4% success rate
Why Tally

The smart layer between
your app and the LLMs

Most teams pick one model and use it for everything. Tally learns the shape of each task and routes to the most cost-effective model that will still get the job done.

💰

Cut costs without guesswork

Tally's multi-armed bandit learns which model handles each task type well — then exploits that knowledge to save you money on every call.

🎯

Quality never compromised

Routing decisions are driven by real success signals. If a cheaper model starts underperforming, Tally detects it and adjusts automatically.

📊

Full observability

Every call is tagged with semantic metadata — task type, complexity, tools, structure. See exactly where your AI budget is going.

🧠

Gets smarter over time

Each telemetry event feeds the bandit. The longer Tally runs on your workload, the more precisely it can exploit model strengths.

🔌

Drop-in SDK

Two API calls — route() before and telemetry() after. No infrastructure changes, no proxy servers, no rewrites.

🏢

Team-aware billing

Organize by org, set per-team token budgets, and track which products or users are driving costs. Multi-org support built in.

How It Works

Three steps to smarter routing

Tally wraps your existing LLM calls. No infrastructure changes required.

1

Describe the task

Before each LLM call, build a semantic envelope describing the task — its type, complexity, structure, tools needed, and context length. Takes one line of code.

2

Ask Tally which model

Call route() with the envelope and your available models. Tally's bandit returns the recommended model — either exploiting what it knows or exploring to keep learning.

3

Report the outcome

After the LLM responds, fire telemetry() with the result — tokens used, success/fail, quality score. Tally updates its model and the routing gets smarter.

example.ts
import { TallyClient, buildEnvelope } from '@tally/sdk'
 
const tally = new TallyClient({
apiKey: process.env.TALLY_API_KEY
})
 
// 1. Describe the task
const envelope = buildEnvelope({
taskType: 'code-debug',
contextLength: 'long'
})
 
// 2. Get route recommendation
const { recommended_model } =
await tally.route(envelope, models)
 
// 3. Call LLM, then report
const result = await callLLM(recommended_model)
tally.telemetry({ model_used: recommended_model,
outcome: 'success', ntok: result.tokens })
Full technical walkthrough →
71%
average cost reduction on routable tasks
6
supported model providers
<5ms
typical route decision latency
gets smarter with every call
Live Demo

Watch the bandit learn

The harness generates realistic workloads — code debugging, architecture design, data analysis, content writing — and streams live routing decisions as Tally learns which model handles each scenario best.

Watch exploration vs. exploitation play out in real time. See cost savings accumulate with every correctly routed call.

Run the harness →
harness · live diagnostics
Events: 412 · Rate: 5/s · Mode: random
 
[412] haiku code-debug $0.0003 exploit
[411] sonnet arch-design $0.0021 explore
[410] haiku data-analysis $0.0002 exploit
[409] haiku code-review $0.0004 exploit
[408] gpt4o creative-writing $0.0018 explore
 
Model distribution (last 100):
haiku ████████████████████ 67%
sonnet ████████ 24%
gpt-4o ███ 9%
 
Total cost: $0.2847 vs $0.9103 baseline
Savings: $0.6256 (68.7%)

Ready to stop overpaying for AI?

Accounts are now open. Get started for free.