Why AI Agent Costs Are Unpredictable: How to Control Consumption-Based Spending

AI agent costs are unpredictable because agents use consumption-based pricing, follow non-deterministic reasoning paths, and trigger variable numbers of tool calls — making every task a different price.

A consulting firm ran the same research agent on two similar client briefs. One cost £0.40. The other cost £14.80. Same agent, same type of task, 37x cost difference. Nobody noticed until the monthly bill arrived. This is not a bug. It is the nature of how AI agents work.

The unpredictability problem is real and growing. Gartner projects that 40% of agentic AI projects will be cancelled by 2027, largely due to cost escalation. Research from 0g.ai shows that inference costs consume 60–80% of total AI spend. For professional services firms billing clients for AI-assisted work, uncontrolled costs erode margins and destroy trust. The firms that survive will be those that learn to manage consumption-based spending without crippling their agents.

Why Are AI Agent Costs Inherently Unpredictable?

Six factors combine to make AI agent costs impossible to predict from one task to the next.

Non-deterministic execution. The same prompt can produce different reasoning paths each time. One run might take 800 tokens. The next might take 4,000. The model decides how to think, and you pay for every token of that thinking.

Agentic reasoning multiplier. A simple chatbot makes one inference per request. An AI agent makes five to twenty. It plans, reasons, acts, observes, re-plans, and acts again. Each inference step costs money. A task that requires three planning cycles costs three times more than a task that needs one.

Recursive loops. Agent A calls Agent B, which calls Agent A again. Multi-agent systems can enter cycles where each agent triggers the next, and costs multiply with every loop. Without loop detection, a single task can burn through hundreds of inferences before anyone intervenes.

Variable tool usage. Some tasks trigger two tool calls. Others trigger twenty. A research agent that finds its answer in the first source costs far less than one that searches fifteen sources, reads nine documents, and cross-references three databases. The agent decides how many tools to use based on the task.

Context window inflation. As conversations grow longer, each subsequent request sends more tokens. A 2,000-token context costs a fraction of a 120,000-token context. Agents that carry forward full conversation histories see their per-request costs climb with every step.

Model selection variance. Many agent frameworks auto-route between models — cheap models for simple tasks, expensive models for hard ones. This adds another layer of unpredictability. You do not know which model will handle which request until it happens.

How Does Consumption-Based Pricing Make Things Worse?

Traditional SaaS tools charge a fixed monthly fee. You know the cost before the month starts. AI agents work differently.

Consumption-based pricing means you pay per token, per inference, or per action. There is no spending ceiling. Major platform providers have adopted this model: per-message pricing from one provider, per-action pricing from another. Even packaged enterprise AI products now use consumption-based billing underneath.

This creates a paradox. Successful agents cost more than failed ones. An agent that produces a thorough 3,000-word analysis consumes far more tokens than one that gives up after two sentences. The better the output, the higher the bill.

The pricing model also makes budgeting nearly impossible. A fixed SaaS fee is a known line item. Consumption-based AI spend varies month to month, project to project, and task to task. Finance teams cannot forecast it reliably without historical data — and most firms do not have enough history yet.

Professional services firms are particularly exposed. Their work demands complex reasoning — legal analysis, strategic research, financial modelling. These are exactly the tasks that consume the most tokens and trigger the most tool calls. A single complex research brief can cost more than a hundred routine classification tasks. Tracking these costs at the client level is essential for maintaining margins.

What Do Real-World Cost Spikes Look Like?

Cost spikes follow predictable patterns, even if individual costs do not.

The tangent problem. A research agent receives a brief to analyse three competitors. It follows a thread into adjacent markets and produces a fifteen-page analysis nobody asked for. The task that should have cost £1.20 costs £8.50. The extra output might even be useful — but nobody authorised the spending.

The retry loop. A coding agent tries to fix a failing test. It modifies the code, runs the test, sees it fail, modifies again. Twenty iterations later, the test still fails. The agent spent £6 on a problem that needed a human decision, not another retry.

Context duplication. In a multi-agent workflow, each agent re-reads the full context from the previous step. Three agents in a chain means three full context reads. The same information gets processed and billed three times.

Uncontrolled external calls. An agent triggers a paid data source — a premium API, a licensed database — without human approval. The AI inference cost was £0.30. The external API call was £45. Nobody built a gate to check before making expensive calls.

The common pattern across all these scenarios: 5% of tasks consume 80% of the budget. Most agent runs are cheap. A small number are extremely expensive. Without real-time cost monitoring, those outliers hide in the noise until the bill arrives.

How Can Firms Control Unpredictable AI Agent Costs?

Seven strategies move AI spending from chaotic to manageable.

Budget guardrails

Set hard and soft spending caps at every level: per task, per project, per client, per department. A soft cap triggers an alert. A hard cap halts the agent. Setting budgets at the project and client level prevents any single task from blowing a project’s economics.

Token limits per request

Cap the maximum tokens an agent can consume in a single invocation. This prevents runaway reasoning without eliminating the agent’s ability to think. A 4,000-token cap on output stops a research agent from writing a novel when you asked for a summary.

Loop detection

Monitor inference counts per task. If an agent exceeds a threshold — say, fifteen inferences on a single task — pause it and flag it for human review. Most productive agent runs complete in three to eight inferences. Anything beyond fifteen is likely a loop.

Model routing

Use cheaper models for routine tasks. Reserve expensive reasoning models for complex work. A classification task does not need the same model as a legal analysis. Routing logic based on task type can cut costs by 40–60% without affecting output quality on simple tasks.

Task scoping

Give agents narrower instructions. “Analyse these three competitors on pricing and market share” costs far less than “Research the competitive market.” Specific instructions reduce exploration, tangents, and unnecessary tool calls.

Human approval gates

Require human approval before agents exceed cost thresholds or invoke expensive external tools. A £5 approval gate catches the 5% of tasks that would otherwise consume 80% of the budget.

Batch processing

Aggregate tasks and run them in batches rather than real-time. Batch processing allows you to use lower-cost compute tiers, and it makes spending more predictable. A firm that runs 200 research tasks per day knows its batch cost within a tight range.

How Do You Build Predictability into AI Operations?

Cost control strategies reduce spikes. Predictability requires data.

Establish cost baselines. Measure the average cost per task type over at least one hundred executions. A research task might average £1.40 with a standard deviation of £0.80. A coding task might average £0.60 with a standard deviation of £0.30. These baselines become your reference points. The history builds itself if every agent session is logged as it completes — Keito records sessions as source=agent time entries with token spend logged as expenses, accumulating per-task cost data by client and project.

Set anomaly detection thresholds. Flag any task that exceeds three times the baseline cost. These outliers warrant investigation. Was the cost justified by a genuinely complex task, or did the agent misbehave?

Create cost forecasting models. Based on the project scope and expected task mix, forecast the AI spend for each project before work begins. A project with fifty research tasks and thirty coding tasks has a forecastable cost range. Share that forecast with project managers and clients.

Report on cost variance weekly. Compare planned spend to actual spend. Identify the root cause of any variance. Was it a spike in a specific task type? A misconfigured agent? A client project that required more complex reasoning than expected?

Connect costs to outcomes. The most important metric is not cost per task — it is cost per outcome. An agent that costs £4 per research brief but saves two hours of analyst time is cheap. An agent that costs £0.50 per task but produces work that needs heavy revision is expensive. Track both sides.

What Does the Industry Data Say About AI Agent Cost Overruns?

The numbers tell a consistent story. AI agent costs are real, growing, and frequently underestimated.

Gartner’s projection that 40% of agentic AI projects will be cancelled by 2027 is driven primarily by cost escalation. Firms launch pilot programmes, see initial results, and then discover that scaling from ten agent tasks per day to ten thousand creates cost curves they did not anticipate.

Research from 0g.ai shows that inference — the GPU processing that runs the model — accounts for 60–80% of total AI spend. This is the single largest cost component, and it is the most variable. A request that triggers a long reasoning chain costs far more inference than a simple classification. Firms cannot control how the model reasons. They can only control how often it reasons and for how long.

The agentic reasoning multiplier makes this worse. A standard chatbot interaction is one inference. An agentic request — where the agent plans, acts, observes, and re-plans — triggers five to twenty inferences per request. Each inference carries its own token cost, compute cost, and latency. The compound effect turns a £0.02 inference into a £0.40 task, or a £4 task, depending on complexity.

Firms that treat these costs as rounding errors during pilot programmes face sticker shock at scale. The answer is not to avoid AI agents. It is to build cost awareness into operations from day one.

Key Takeaway

AI agent costs will never be as fixed as a SaaS subscription — but with guardrails, baselines, and real-time monitoring, they can be managed and forecast.

Frequently Asked Questions

Why are AI agent costs unpredictable?

AI agent costs are unpredictable because agents use consumption-based pricing and follow non-deterministic reasoning paths. The same task can trigger different numbers of inferences, tool calls, and tokens each time it runs. Factors include recursive loops, variable tool usage, context window growth, and model routing decisions.

How do you control AI agent spending?

Control AI agent spending with budget guardrails (hard and soft caps per task, project, and client), token limits per request, loop detection to halt recursive cycles, model routing to use cheaper models for routine work, task scoping to reduce exploration, and human approval gates for high-cost actions.

What causes AI agent cost spikes?

Cost spikes are typically caused by agents entering retry loops, following tangents beyond their brief, duplicating context across multi-agent workflows, or triggering expensive external API calls without approval. In most firms, 5% of tasks consume 80% of the AI budget.

How does consumption-based pricing affect AI agent costs?

Consumption-based pricing means firms pay per token, per inference, or per action — with no spending ceiling. Unlike fixed SaaS subscriptions, costs scale with usage and complexity. Successful agents that produce thorough outputs cost more than those that produce minimal results, creating a paradox where better performance means higher bills.

What is the best way to make AI agent costs predictable?

Build predictability through data. Measure average costs per task type over one hundred or more executions to establish baselines. Set anomaly detection thresholds at three times the baseline. Create cost forecasting models based on project scope and task mix. Report on planned versus actual spend weekly, with root cause analysis for variances.

How much of total AI spend goes to inference costs?

Research from 0g.ai shows that inference costs — the GPU processing needed to run AI models — consume 60–80% of total AI spend. This makes inference the single largest cost component for firms running AI agents, and the primary driver of unpredictable consumption-based bills.

Keito gives professional services firms visibility over unpredictable AI agent spending — every session and token cost tracked against project budgets. Learn how Keito works.