AI Token Cost Calculator: How to Estimate and Budget LLM Spending

An AI token cost calculator estimates what your firm will spend on large language model requests by multiplying token counts by per-token prices for each model — then adding tool call and infrastructure costs.

A single complex AI agent task can consume anywhere from 1,000 to 500,000 tokens. Most professional services firms have no idea how to estimate that cost before it hits their API bill. The result: budget surprises, mispriced client work, and finance teams scrambling to reconcile invoices they cannot map to any project.

Understanding token pricing and knowing how to estimate costs per task, per project, and per client is non-negotiable for any firm deploying AI agents in billable work. This guide gives you the formulas, the benchmarks, and a practical framework for building your own AI token cost calculator.

Key Takeaway: Use the formula (input tokens x input price) + (output tokens x output price) + tool costs to estimate AI spend before it hits your bill.

How Does AI Token Pricing Work?

Tokens are the basic unit of AI language model pricing. A token is roughly four characters or 0.75 words. The sentence “How much does this cost?” contains about seven tokens.

Every request to a language model consumes tokens in two categories: input tokens (what you send) and output tokens (what the model returns). Some models add a third category: reasoning tokens, consumed when the model “thinks” before responding.

Why Do Output Tokens Cost More Than Input Tokens?

Output tokens typically cost three to five times more than input tokens. This reflects the compute required. Generating new text demands more GPU processing than reading existing text. For reasoning models, the multiplier is even steeper — five to twenty times the base rate.

What About Context Window Costs?

Larger prompts with system instructions, few-shot examples, and retrieved context cost more per request. A 2,000-token prompt costs more than a 200-token prompt, even if the output is the same length. Firms running retrieval-augmented generation (RAG) pipelines often send 10,000 to 50,000 tokens of context per request.

Some providers offer prompt caching — where repeated system instructions are charged at a reduced rate. This can cut input costs by 50–90% for workflows that reuse the same system prompt across many requests.

What Does Token Pricing Look Like Across Provider Tiers?

Provider pricing varies by model capability. Rather than comparing individual products, it helps to think in tiers. The table below shows approximate 2026 pricing per one million tokens across four tiers.

Model Tier	Capability Level	Input Price (per 1M tokens)	Output Price (per 1M tokens)	Reasoning Multiplier
Tier 1 — Frontier	Highest reasoning, complex tasks	£10–£15	£30–£75	5–20x
Tier 2 — Standard	Strong general-purpose	£2.50–£5.00	£10–£15	2–5x
Tier 3 — Lightweight	Fast, simple tasks	£0.20–£0.80	£0.80–£4.00	N/A
Tier 4 — Open-source hosted	Variable, self-managed	£0.10–£0.50	£0.30–£2.00	N/A

Key observations:

Frontier models cost 20–50x more than lightweight models per token. The performance gap does not always justify the price gap.
Reasoning models in Tier 1 can consume 5–20x more tokens than standard models for the same task, because they generate internal reasoning chains.
Managed cloud platform markups (running models through a cloud provider’s AI service) add 10–30% over direct API pricing.

The right model tier depends on the task. Simple classification does not need a frontier model. Complex multi-step research does. Matching model capability to task complexity is the single biggest lever for controlling AI token costs.

For a deeper look at overall AI agent economics, see our guide on how much AI agents cost.

How Do You Estimate AI Costs Per Task?

The core formula for any AI token cost calculator is:

Total task cost = (input tokens x input price) + (output tokens x output price) + tool call costs

For reasoning models, add:

+ (reasoning tokens x reasoning price)

Typical Token Counts by Task Type

Different tasks consume wildly different amounts of tokens. Here are benchmarks based on professional services workloads:

Task Type	Typical Token Range	Estimated Cost (Tier 2 Model)	Estimated Cost (Tier 1 Model)
Simple Q&A or classification	500–2,000	£0.01–£0.03	£0.05–£0.15
Document summarisation (single doc)	5,000–50,000	£0.05–£0.50	£0.25–£5.00
Deep research with multiple sources	50,000–200,000	£0.50–£3.00	£5.00–£30.00
Code generation with context	10,000–100,000	£0.10–£1.50	£1.00–£15.00
Multi-agent orchestration	100,000–500,000+	£1.00–£8.00	£10.00–£75.00+

Worked Example 1: Legal Research Task

A legal research agent receives a 5,000-token prompt (including system instructions and retrieved case law). It generates a 2,000-token summary. Using a Tier 2 model priced at £3.00/1M input and £12.00/1M output:

Input cost: 5,000 / 1,000,000 x £3.00 = £0.015
Output cost: 2,000 / 1,000,000 x £12.00 = £0.024
Tool calls (two database lookups at £0.005 each): £0.01
Total task cost: £0.049

Run that task 500 times per month across client matters, and you are looking at roughly £25/month. Manageable — but switch to a Tier 1 reasoning model, and the same workload could cost £250–£500/month.

Worked Example 2: Multi-Agent Code Review

A code review workflow uses three agents in sequence: a scanner agent reads the code (40,000 input tokens), a reviewer agent analyses it and produces findings (15,000 output tokens including reasoning), and a reporter agent formats the output (3,000 input + 2,000 output tokens). Using Tier 2 pricing:

Scanner input: 40,000 / 1,000,000 x £3.00 = £0.12
Reviewer output: 15,000 / 1,000,000 x £12.00 = £0.18
Reporter input: 3,000 / 1,000,000 x £3.00 = £0.009
Reporter output: 2,000 / 1,000,000 x £12.00 = £0.024
Tool calls (linter, test runner): £0.05
Total task cost: £0.38

At 200 code reviews per month, that is £76/month. Compared to a developer spending 30 minutes per review at £75/hour, the human cost would be £2,500/month. The AI token cost calculator reveals a 97% cost reduction — before accounting for human oversight time.

Worked Example 3: Document Summarisation at Scale

A consulting firm summarises 50 client reports per week. Each report averages 30,000 tokens of input and generates a 3,000-token summary. Using Tier 3 pricing (£0.50/1M input, £2.00/1M output):

Input per report: 30,000 / 1,000,000 x £0.50 = £0.015
Output per report: 3,000 / 1,000,000 x £2.00 = £0.006
Cost per report: £0.021
Monthly cost (200 reports): £4.20

The same work using a Tier 1 model would cost roughly £90/month — twenty times more for a task that does not require frontier reasoning. Model selection is the biggest lever.

For detailed cost breakdowns by component, see our piece on AI agent cost breakdowns: tokens and inference.

The Hidden Cost of Reasoning Models

Reasoning models deserve special attention in any AI token cost calculator. These models generate internal reasoning chains — sometimes called “thinking tokens” — before producing their final output. The reasoning tokens are billed but not always visible in the response.

A standard model might use 1,000 output tokens to answer a question. A reasoning model might use 8,000 reasoning tokens plus 1,000 output tokens. At Tier 1 output pricing of £50/1M tokens, those extra reasoning tokens add £0.40 per request. Over thousands of requests, this adds up fast.

Use reasoning models only when the task genuinely requires multi-step logic. For straightforward extraction, classification, or summarisation, standard models deliver equivalent quality at a fraction of the cost.

How Do You Build a Cost Estimation Framework for Your Firm?

Moving from ad-hoc calculations to a repeatable estimation framework takes five steps.

Step 1: Catalogue Your Agent Workflows

List every AI agent workflow in use across your firm. Note the task type, frequency, and model tier for each. Common categories include:

Research and analysis agents
Document drafting and review agents
Code generation and testing agents
Data extraction and classification agents
Client communication agents

Step 2: Measure Average Token Consumption

Sample 50 to 100 completed tasks per workflow. Record input tokens, output tokens, and any reasoning tokens. Calculate the mean and the 90th percentile. The 90th percentile matters because cost overruns come from outliers, not averages.

Step 3: Apply Provider Pricing

Multiply your measured token counts by the per-token rates for your current provider and model. Use the formula above. Do this per workflow.

Build a pricing lookup table that maps each model to its input, output, and reasoning token rates. Update it quarterly — provider pricing changes frequently, and rates have generally fallen 30–50% per year over the past three years.

Step 4: Multiply by Expected Volume

Estimate how many tasks of each type you will run per month, per project, or per client. Multiply unit costs by volume. This gives you a monthly cost estimate per workflow.

Be conservative with volume estimates. Firms that deploy AI agents often see usage grow 3–5x within the first six months as teams find new applications. Build growth into your forecast.

Step 5: Add Infrastructure Overhead

Token costs are typically 30–40% of total AI agent spend. Add a 15–25% buffer for:

Embedding generation and vector database queries
Tool call fees (web searches, API calls, code execution)
Retry and error handling (failed requests that still consume tokens)
Monitoring and logging costs

Your final estimate should look like:

Monthly AI cost = sum of (unit cost per task x monthly task volume) x 1.20 overhead factor

Why Are Estimates Alone Not Enough?

Estimates are a starting point. They are not a control mechanism.

Actual AI agent costs regularly exceed estimates by two to three times. Why? Because agents behave unpredictably. A research agent might make three tool calls on one task and twenty on the next. A coding agent might solve a problem in one pass or retry fifteen times.

Estimation tells you what to budget. Live tracking tells you what you are actually spending. The gap between the two is where firms lose money.

Keito closes that gap by letting agents log token spend as LLM usage expenses against the right client and project as they work — via the API v2 and Node/Python SDKs — so actuals accumulate alongside the budget instead of arriving as a month-end surprise.

Use your estimates as baselines for budget alerts. When actual spend exceeds the estimate by 50%, trigger an investigation. When it exceeds 100%, pause the workflow and review.

The Estimation-to-Tracking Pipeline

The ideal approach combines both:

Before a project: Use your AI token cost calculator estimates to set the AI budget and agree pricing with the client.
During the project: Track actual token costs in real time against the estimated budget. Flag deviations early.
After the project: Compare actual versus estimated costs. Feed the variance back into your estimation model to improve accuracy next time.

This feedback loop gets tighter over time. Firms that have been tracking for six months produce estimates that are accurate to within 20%. Firms guessing without historical data are routinely off by 100% or more.

For guidance on moving from estimation to real-time monitoring, see our guide on AI agent cost tracking for professional services.

How Can You Reduce AI Token Costs Without Sacrificing Quality?

An AI token cost calculator is not just for measurement — it also reveals where you are overspending.

Match Model Tier to Task Complexity

The single biggest cost lever. A firm using a Tier 1 frontier model for email classification is spending 20–50x more than necessary. Route simple tasks to Tier 3 or Tier 4 models. Reserve Tier 1 for tasks that genuinely require advanced reasoning.

Use Prompt Caching

If your agents reuse the same system prompt across many requests, prompt caching can reduce input token costs by 50–90%. Check whether your provider supports cached prompts and enable it for high-volume workflows.

Trim System Prompts

System prompt bloat is common. Prompts grow as teams add edge case handling and formatting rules. A 5,000-token system prompt sent 10,000 times per month costs £150 at Tier 2 input pricing. Regularly audit and trim system prompts. Remove instructions that the model follows by default.

Set Token Limits on Output

Many tasks do not need long responses. Setting a maximum output token count prevents the model from generating unnecessarily verbose answers. A 500-token limit on a classification task prevents the model from producing a 2,000-token explanation when a one-word answer will do.

Batch Low-Priority Requests

Some providers offer batch processing at reduced rates — typically 50% of real-time pricing. If your workflow can tolerate a delay of minutes or hours, batch processing halves your token costs for those tasks.

Key Takeaway: Estimates set the budget. Live tracking catches the overruns. You need both.

Frequently Asked Questions

How do you calculate AI token costs?

Use the formula: (input tokens x input price per token) + (output tokens x output price per token) + tool call costs. For reasoning models, add reasoning tokens at their specific rate. Apply per-million-token pricing from your provider, then add 15–25% for infrastructure overhead.

How much does one million tokens cost?

It depends on the model tier. Lightweight models cost £0.20–£0.80 per million input tokens. Standard models cost £2.50–£5.00. Frontier reasoning models cost £10–£15 for input and £30–£75 for output. Output tokens always cost more than input tokens.

What is the difference between input and output token pricing?

Input tokens are what you send to the model — prompts, context, instructions. Output tokens are what the model generates. Output tokens cost three to five times more because text generation requires more compute than text comprehension. Reasoning tokens, used by “thinking” models, cost even more.

How many tokens does a typical AI agent task use?

It varies widely. Simple classification uses 500–2,000 tokens. Document summarisation uses 5,000–50,000. Deep research with multiple sources uses 50,000–200,000. Multi-agent workflows can exceed 500,000 tokens per task.

Which model tier is most cost-effective per token?

Lightweight models (Tier 3) are cheapest per token but cannot handle complex reasoning. Standard models (Tier 2) offer the best balance of capability and cost for most professional services tasks. Frontier models (Tier 1) should be reserved for tasks that genuinely require advanced reasoning — using them for simple tasks wastes money.

How do I estimate AI costs for a new client project?

Catalogue the agent workflows the project will use. Measure average token consumption per task type from past projects. Multiply by expected task volume. Add 15–25% for infrastructure and overhead. Compare the total against the project budget and billing rate to confirm profitability.

Keito tracks token costs automatically — per task, per client, per project. No spreadsheets required. Start tracking AI costs today.