Documentation

SDK & API Reference

Everything you need to integrate Tally into your application.

Installation

The Tally SDK is available as an npm package. It works in Node.js 18+ and any modern bundler (Vite, esbuild, Rollup).

bash
npm install @tally/sdk

The SDK has zero runtime dependencies. The only requirement is an API key, which you get automatically when you create your account. The source is available on GitHub.

Quickstart

Add two calls to your existing LLM integration: route() before the LLM call, and telemetry() after.

typescript
import Anthropic from '@anthropic-ai/sdk'
import { TallyClient, buildEnvelope } from '@tally/sdk'

const anthropic = new Anthropic()
const tally = new TallyClient({ apiKey: process.env.TALLY_API_KEY! })

async function askAI(userMessage: string): Promise<string> {
  // 1. Describe this task to Tally
  const envelope = buildEnvelope({
    taskType:      'qa-simple',
    contextLength: 'short',
  })

  // 2. Ask which model to use
  const availableModels = [
    'claude-haiku-3-5-20251001',
    'claude-sonnet-4-5-20251001',
  ]
  const { recommended_model, exploration_flag } =
    await tally.route(envelope, availableModels)

  // 3. Call the recommended model
  const t0 = Date.now()
  const response = await anthropic.messages.create({
    model:      recommended_model,
    max_tokens: 1024,
    messages:   [{ role: 'user', content: userMessage }],
  })

  const content    = response.content[0].type === 'text'
    ? response.content[0].text : ''
  const ntokInput  = response.usage.input_tokens
  const ntokOutput = response.usage.output_tokens

  // 4. Report outcome back to Tally (fire and forget)
  tally.telemetry({
    semantic_envelope:  envelope,
    model_used:         recommended_model,
    recommended_model,
    outcome:            'success',
    ntok_input:         ntokInput,
    ntok_output:        ntokOutput,
  })

  return content
}

Your API Key

After signing up, your API key is available in the Portal under your account settings. It looks like:

text
tly_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Set it as an environment variable. Never commit it to source control.

bash
export TALLY_API_KEY=tly_your_key_here

TallyClient

The main entry point. Instantiate once per application.

typescript
import { TallyClient } from '@tally/sdk'

const tally = new TallyClient({
  apiKey:          process.env.TALLY_API_KEY!,   // required
  endpoint:        'https://api.tallyy.org',      // optional override
  calibrationMode: 'off',                         // 'off' | 'standard' | 'aggressive'
  sdkVersion:      '1.0.0',                       // optional — enables 'certified' trust level
  onError:         (err) => console.error(err),   // optional error handler
})

Constructor options

Option Type Default Description
apiKey string Required. Your Tally API key.
endpoint string https://api.tallyy.org Override the Tally API endpoint.
calibrationMode CalibrationMode 'off' Warm-up mode for new installations. Use 'standard' for first week.
sdkVersion string undefined Set to enable 'certified' trust level for telemetry.

buildEnvelope()

Constructs a Semantic Envelope describing the shape of the task. This is what Tally reasons over — not the content of your prompt.

typescript
import { buildEnvelope } from '@tally/sdk'

const envelope = buildEnvelope({
  taskType:         'code-debug',          // see Task Types below
  structureType:    'code',                // 'prose' | 'code' | 'json' | 'list' | 'mixed'
  contextLength:    'long',                // 'short' | 'medium' | 'long' | 'very-long'
  estimatedTokens:  '2k-8k',              // optional token bucket hint
  toolsDescriptors: [{ type: 'search' }], // optional tool list
  timeSensitive:    false,                 // optional urgency hint
})

Only taskType is required. All other fields are optional hints that improve routing accuracy.

route()

Asks Tally which model to use for this envelope. Returns the recommended model and an exploration flag.

typescript
const result = await tally.route(envelope, availableModels, {
  // Optional hints
  intentHint:             'The user is debugging a React hook',
  userSkillBand:          'expert',
  explicitTimeSensitivity: 'low',
})

console.log(result.recommended_model)  // 'claude-haiku-3-5-20251001'
console.log(result.exploration_flag)   // false — exploiting known best

The availableModels array is the set of models your account has access to. Tally only recommends from this list.

telemetry()

Reports the outcome of an LLM call. Fire-and-forget — non-blocking. The SDK retries on failure.

typescript
tally.telemetry({
  // Envelope — same one you passed to route()
  semantic_envelope:  envelope,

  // What model was used
  model_used:         'claude-haiku-3-5-20251001',

  // What Tally recommended (for adoption tracking)
  recommended_model:  'claude-haiku-3-5-20251001',

  // Outcome
  outcome:            'success',   // 'success' | 'fail'

  // Token counts (for cost estimation)
  ntok_input:         4200,
  ntok_output:        380,

  // Optional quality signal [0, 1]
  quality_score:      0.92,

  // Optional session context (enables re-ask tracking)
  session_id:          'sess_abc123',
  conversation_turn_index: 3,
})

Task Types

The taskType field in your envelope tells Tally what kind of work is happening. Choose the closest match.

code-debug code-review code-generation architecture-design data-analysis summarisation qa-simple creative-writing task-planning research-synthesis document-generation classification

Calibration Mode

When you first deploy Tally, it has no data about your workload. Calibration mode forces a higher exploration rate so the bandit can build an initial model quickly.

off

Normal operation. Use for established workloads where the bandit has enough data.

standard

Recommended for first 1–2 weeks. Balanced exploration to build the model faster.

aggressive

High exploration rate. Use when launching with a completely new model pool.

Organisations

Every Tally account comes with a personal org. You can create additional team orgs and invite collaborators. API keys are scoped to orgs, so you can track costs per team or product.

Manage orgs and invite team members from the Portal under Account → Organisations.

Test Harness

The harness is a CLI tool that generates realistic synthetic workloads and drives them through Tally's routing engine. Use it to:

  • Warm up Tally's model with calibration data before going live
  • Verify your API key and endpoint connectivity
  • Watch the bandit algorithm in action with live diagnostics
  • Benchmark routing latency
bash
# Install harness globally
npm install -g @tally/harness

# Run 100 events, random scenario mix, 5 events/sec
TALLY_API_KEY=tly_xxx tally-harness

# Run 500 events, code-debug only, faster rate
TALLY_API_KEY=tly_xxx tally-harness --count 500 --rate 20 --scenario code-debug

# See available scenarios
tally-harness --list

# Calibration warmup (aggressive)
TALLY_API_KEY=tly_xxx tally-harness --count 2000 --calibration standard --burst

Or try the browser-based simulation — no API key required.

Inspector

The Inspector is the admin observability dashboard. It shows:

  • Clusters — how your real workload maps to task type clusters
  • Looms — per-client usage, cost, and success rates
  • Live feed — streaming telemetry events in real time
  • Model distribution — which models are winning for which task types
  • Cost trends — daily and hourly cost breakdown

Inspector access requires admin credentials. Contact team@tallyy.org for access.

Get your API key in 30 seconds

Sign in with Google. Your key is ready the moment your account is created.