Documentation

Installation

The Tally SDK is available as an npm package. It works in Node.js 18+ and any modern bundler (Vite, esbuild, Rollup).

bash

npm install @tally/sdk

The SDK has zero runtime dependencies. The only requirement is an API key, which you get automatically when you create your account. The source is available on GitHub.

Quickstart

Add two calls to your existing LLM integration: route() before the LLM call, and telemetry() after.

typescript

import Anthropic from '@anthropic-ai/sdk'
import { TallyClient, buildEnvelope } from '@tally/sdk'

const anthropic = new Anthropic()
const tally = new TallyClient({ apiKey: process.env.TALLY_API_KEY! })

async function askAI(userMessage: string): Promise<string> {
  // 1. Describe this task to Tally
  const envelope = buildEnvelope({
    taskType:      'qa-simple',
    contextLength: 'short',
  })

  // 2. Ask which model to use
  const availableModels = [
    'claude-haiku-3-5-20251001',
    'claude-sonnet-4-5-20251001',
  ]
  const { recommended_model, exploration_flag } =
    await tally.route(envelope, availableModels)

  // 3. Call the recommended model
  const t0 = Date.now()
  const response = await anthropic.messages.create({
    model:      recommended_model,
    max_tokens: 1024,
    messages:   [{ role: 'user', content: userMessage }],
  })

  const content    = response.content[0].type === 'text'
    ? response.content[0].text : ''
  const ntokInput  = response.usage.input_tokens
  const ntokOutput = response.usage.output_tokens

  // 4. Report outcome back to Tally (fire and forget)
  tally.telemetry({
    semantic_envelope:  envelope,
    model_used:         recommended_model,
    recommended_model,
    outcome:            'success',
    ntok_input:         ntokInput,
    ntok_output:        ntokOutput,
  })

  return content
}

Your API Key

After signing up, your API key is available in the Portal under your account settings. It looks like:

text

tly_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Set it as an environment variable. Never commit it to source control.

bash

export TALLY_API_KEY=tly_your_key_here

TallyClient

The main entry point. Instantiate once per application.

typescript

import { TallyClient } from '@tally/sdk'

const tally = new TallyClient({
  apiKey:          process.env.TALLY_API_KEY!,   // required
  endpoint:        'https://api.tallyy.org',      // optional override
  calibrationMode: 'off',                         // 'off' | 'standard' | 'aggressive'
  sdkVersion:      '1.0.0',                       // optional — enables 'certified' trust level
  onError:         (err) => console.error(err),   // optional error handler
})

Constructor options

Option	Type	Default	Description
apiKey	string	—	Required. Your Tally API key.
endpoint	string	https://api.tallyy.org	Override the Tally API endpoint.
calibrationMode	CalibrationMode	'off'	Warm-up mode for new installations. Use `'standard'` for first week.
sdkVersion	string	undefined	Set to enable `'certified'` trust level for telemetry.

buildEnvelope()

Constructs a Semantic Envelope describing the shape of the task. This is what Tally reasons over — not the content of your prompt.

typescript

import { buildEnvelope } from '@tally/sdk'

const envelope = buildEnvelope({
  taskType:         'code-debug',          // see Task Types below
  structureType:    'code',                // 'prose' | 'code' | 'json' | 'list' | 'mixed'
  contextLength:    'long',                // 'short' | 'medium' | 'long' | 'very-long'
  estimatedTokens:  '2k-8k',              // optional token bucket hint
  toolsDescriptors: [{ type: 'search' }], // optional tool list
  timeSensitive:    false,                 // optional urgency hint
})

Only taskType is required. All other fields are optional hints that improve routing accuracy.

route()

Asks Tally which model to use for this envelope. Returns the recommended model and an exploration flag.

typescript

const result = await tally.route(envelope, availableModels, {
  // Optional hints
  intentHint:             'The user is debugging a React hook',
  userSkillBand:          'expert',
  explicitTimeSensitivity: 'low',
})

console.log(result.recommended_model)  // 'claude-haiku-3-5-20251001'
console.log(result.exploration_flag)   // false — exploiting known best

The availableModels array is the set of models your account has access to. Tally only recommends from this list.

telemetry()

Reports the outcome of an LLM call. Fire-and-forget — non-blocking. The SDK retries on failure.

typescript

tally.telemetry({
  // Envelope — same one you passed to route()
  semantic_envelope:  envelope,

  // What model was used
  model_used:         'claude-haiku-3-5-20251001',

  // What Tally recommended (for adoption tracking)
  recommended_model:  'claude-haiku-3-5-20251001',

  // Outcome
  outcome:            'success',   // 'success' | 'fail'

  // Token counts (for cost estimation)
  ntok_input:         4200,
  ntok_output:        380,

  // Optional quality signal [0, 1]
  quality_score:      0.92,

  // Optional session context (enables re-ask tracking)
  session_id:          'sess_abc123',
  conversation_turn_index: 3,
})

Task Types

The taskType field in your envelope tells Tally what kind of work is happening. Choose the closest match.

code-debug code-review code-generation architecture-design data-analysis summarisation qa-simple creative-writing task-planning research-synthesis document-generation classification

Calibration Mode

When you first deploy Tally, it has no data about your workload. Calibration mode forces a higher exploration rate so the bandit can build an initial model quickly.

`off`

Normal operation. Use for established workloads where the bandit has enough data.

`standard`

Recommended for first 1–2 weeks. Balanced exploration to build the model faster.

`aggressive`

High exploration rate. Use when launching with a completely new model pool.

Organisations

Every Tally account comes with a personal org. You can create additional team orgs and invite collaborators. API keys are scoped to orgs, so you can track costs per team or product.

Manage orgs and invite team members from the Portal under Account → Organisations.

Test Harness

The harness is a CLI tool that generates realistic synthetic workloads and drives them through Tally's routing engine. Use it to:

Warm up Tally's model with calibration data before going live
Verify your API key and endpoint connectivity
Watch the bandit algorithm in action with live diagnostics
Benchmark routing latency

bash

# Install harness globally
npm install -g @tally/harness

# Run 100 events, random scenario mix, 5 events/sec
TALLY_API_KEY=tly_xxx tally-harness

# Run 500 events, code-debug only, faster rate
TALLY_API_KEY=tly_xxx tally-harness --count 500 --rate 20 --scenario code-debug

# See available scenarios
tally-harness --list

# Calibration warmup (aggressive)
TALLY_API_KEY=tly_xxx tally-harness --count 2000 --calibration standard --burst

Or try the browser-based simulation — no API key required.

Inspector

The Inspector is the admin observability dashboard. It shows:

Clusters — how your real workload maps to task type clusters
Looms — per-client usage, cost, and success rates
Live feed — streaming telemetry events in real time
Model distribution — which models are winning for which task types
Cost trends — daily and hourly cost breakdown

Inspector access requires admin credentials. Contact team@tallyy.org for access.

SDK & API Reference

Getting Started

SDK Reference

Concepts

Tools

Source

Installation

Quickstart

Your API Key

TallyClient

Constructor options

buildEnvelope()

route()

telemetry()

Task Types

Calibration Mode

`off`

`standard`

`aggressive`

Organisations

Test Harness

Inspector

Get your API key in 30 seconds