Supervised vs Autonomous AI Agents: Tracking Differences

Supervised and autonomous AI agents need different tracking approaches because they operate at different levels of independence, consume costs differently, and create different billing challenges for professional services firms.

A supervised coding agent that pauses for human approval on every commit costs differently to an autonomous agent that ships features start to finish while the team sleeps. The supervised agent’s time includes human review. The autonomous agent’s time is purely execution. The cost split is different. The billing model is different. The audit requirements are different. Most firms currently operate at supervised levels (levels 2–3 on a five-level autonomy spectrum), but the shift toward higher autonomy is accelerating. Getting your tracking right now means you will not have to rebuild it later.

Key Takeaway: Match your tracking approach to your agent’s autonomy level. Supervised agents need dual logging. Autonomous agents need independent timesheets.

What Are the Autonomy Levels for AI Agents?

AI agents sit on a spectrum from fully human-controlled to fully independent. Five levels describe the range, and each has distinct tracking implications.

Level	Name	How It Works	Tracking Approach
1	Suggestion-based	Agent suggests, human decides and acts	Track as human time with AI tooling
2	Task-based supervised	Agent completes a task, human reviews before delivery	Log agent time and review time separately
3	Workflow-based supervised	Agent executes multi-step workflows with checkpoints	Dual logging at each checkpoint
4	Autonomous with guardrails	Agent operates independently within defined boundaries	Full agent timesheets with exception alerts
5	Fully autonomous	Agent plans, executes, and delivers without human involvement	Independent agent timesheets, post-facto review

Level 1 agents are autocomplete tools. They suggest code completions, sentence endings, or data entries. The human does the real work. Tracking is simple — it is human time, aided by a tool. No separate agent logging is needed.

Level 2 agents complete discrete tasks — drafting a document, extracting data, generating a summary. A human reviews every output before it goes anywhere. Most professional services firms deploying AI agents today operate here. Tracking must capture both the agent’s execution time and the human’s review time.

Level 3 agents handle multi-step workflows. They might research a topic, draft a report, format it, and prepare it for delivery — pausing at defined checkpoints for human approval. Each checkpoint creates a tracking boundary. The agent’s time accumulates across steps; human review time adds at each pause.

Level 4 agents work independently within guardrails. They can execute, make decisions, and take actions without per-task approval, but boundaries constrain what they can do — spending limits, scope restrictions, escalation triggers. These agents need full timesheets that record every action, with alerts when they approach or breach guardrails.

Level 5 agents plan, execute, and deliver without human involvement. The human reviews outcomes after the fact, if at all. These agents require the most detailed tracking because there is no human in the loop to observe what happened in real time.

Most firms are moving from Level 2 toward Level 3 and 4 deployments, according to industry reports from developer tooling and AI infrastructure companies. The tracking requirements escalate with each level.

How Does Time Tracking Differ Between Supervised and Autonomous Agents?

The core difference is whether human time is part of the equation.

Supervised Agent Time Tracking

For supervised agents (levels 2–3), time tracking must capture two distinct windows: agent execution time and human review time. Both contribute to the total time spent on a task.

Consider a practical example. A coding agent takes 20 minutes to generate a module. A senior developer spends 15 minutes reviewing the output, testing edge cases, and approving it. The total task time is 35 minutes — 20 minutes of agent work and 15 minutes of human oversight.

The challenge is separating these cleanly. In practice, human and agent work often interleave. A developer might start reviewing while the agent is still running, request changes mid-execution, or pause review to handle something else. The tracking system needs to handle overlapping windows without double-counting.

The approach that works: log agent execution as a discrete event with start and end timestamps. Log human review as a separate event linked to the same task ID. If they overlap, both run concurrently — total elapsed time is the longer window, not the sum.

For supervised coding agents specifically, Keito’s Agent Skill handles the agent side automatically — each Claude Code or Codex session is logged to the right client and project, leaving only the review entry for the human.

Autonomous Agent Time Tracking

For autonomous agents (levels 4–5), time is purely agent execution time — plus any post-facto human review, which happens separately and often much later.

A research agent runs for 8 minutes at 3am, producing a market analysis. A consultant reviews the output at 9am for 5 minutes. These are two separate time entries: 8 minutes of agent time logged overnight, 5 minutes of human review logged the next morning. This is the pattern Keito’s keito time session-record supports: the agent records its completed run as a source=agent entry the moment it finishes, and the human review is logged separately when it happens.

Autonomous agents can also run in parallel. One agent might handle three client tasks simultaneously. Time tracking needs to record each task independently, even though they share the same clock period. Wall-clock time and compute time diverge — the agent used 24 minutes of compute across three tasks, but only 8 minutes of wall-clock time elapsed.

For firms already tracking AI agent time, the shift from supervised to autonomous tracking means moving from a “human plus agent” model to an “agent plus optional human” model.

The Blended Time Problem

The hardest tracking scenario is when human and agent work interleave continuously on the same task. A consultant dictates instructions while an agent executes them in real time. The consultant adjusts direction mid-task. The agent adapts. Who worked for how long?

The practical answer is to track three values: total elapsed time, agent active time within that window, and human active time within that window. The sum of agent and human time may exceed elapsed time (because they ran concurrently) or fall below it (because of idle gaps). All three numbers matter for different purposes — elapsed time for project management, agent time for AI cost attribution, human time for staff billing.

How Does Cost Tracking Differ?

Supervised and autonomous agents have inverse cost profiles. Understanding this split is essential for accurate pricing and profitability analysis.

Supervised Agent Costs

Supervised agents tend to have lower AI cost per task but higher total cost when human oversight is included. The agent does less independent reasoning because humans handle the decision-making. Fewer tokens, simpler tool calls, shorter execution. But every task also consumes human time for review and approval.

In a typical supervised deployment, human oversight accounts for 40–60% of the total task cost. The agent’s compute cost is modest. The expensive part is the senior professional who checks every output.

Autonomous Agent Costs

Autonomous agents have higher AI cost per task but potentially lower total cost if human review is minimal. The agent consumes more tokens because it handles planning, reasoning, self-correction, and multi-step execution independently. Tool calls multiply. Context windows fill up. But human involvement is limited to exception handling and periodic review.

The risk with autonomous agents is cost spikes from recursive loops, over-elaborate outputs, and unnecessary tool calls. An agent that enters a retry loop can burn through its token budget rapidly. Cost guardrails — maximum token spend per task, per hour, per client — are essential at autonomy levels 4 and 5.

Break-Even Analysis

The central question for any firm is: at what autonomy level does the total cost (agent compute plus human oversight) become lower than purely human work?

Cost Component	Supervised (L2–3)	Autonomous (L4–5)	Fully Human
AI compute	Low (£2–5/task)	Medium-High (£5–15/task)	None
Human oversight	High (15–30 min/task)	Low (2–5 min/task)	Full (30–60 min/task)
Total cost profile	Moderate	Lower at scale	Highest
Risk of cost spikes	Low	Medium-High	Low

The break-even depends on task complexity, model pricing, and staff hourly rates. For routine, repeatable tasks, autonomous agents reach cost parity quickly. For complex, judgement-heavy work, supervised agents remain more cost-effective because the risk of autonomous errors outweighs the human oversight savings.

Tracking cost per task across autonomy levels gives firms the data to make this decision per task type rather than as a blanket policy.

How Does Agent Autonomy Affect Billing?

Billing models must reflect the autonomy level. A supervised agent’s output is fundamentally different from an autonomous agent’s output in terms of accountability, effort, and client perception.

Billing Supervised Agent Work

Supervised agent work is typically billed as human time. The professional did the work; the agent assisted. Clients understand this model because it fits the existing framework of “expert with tools.”

Pricing usually follows standard hourly rates, sometimes with an efficiency discount. A task that previously took a consultant two hours might take 45 minutes with agent assistance. Some firms bill the full two hours (value-based). Others bill 45 minutes at a standard rate (time-based). A few bill a flat per-task fee. The ethical clarity is that a human is accountable for the output — they reviewed it, approved it, and take responsibility.

Billing Autonomous Agent Work

Autonomous agent work requires new billing models. Billing hourly rates for work that took an agent three minutes of compute does not hold up to client scrutiny. Equally, billing only the compute cost undervalues the output.

Three models are emerging. Agent-time billing charges for the agent’s active execution time at a rate lower than human hourly rates. Per-task billing charges a fixed fee per completed task regardless of how long the agent took. Value-based billing charges based on the outcome — the deliverable’s worth to the client, not the effort expended.

Disclosure requirements increase with autonomy. Clients expect to know when AI did the work. At supervised levels, AI involvement is incidental — “our team used AI-assisted tools.” At autonomous levels, AI is the primary worker — “this deliverable was produced by an AI agent and reviewed by a senior consultant.” Transparency builds trust. Hiding autonomous AI involvement risks the relationship.

How Do You Choose the Right Tracking Approach?

Start by auditing your current agent deployment. Map each agent to an autonomy level. Then match tracking granularity to that level.

Levels 1–2: Track as enhancement to human time. The agent is a tool. Log its usage for cost tracking and reporting, but the primary time entry is the human’s. Simple logging covers it. No separate agent timesheet needed.

Level 3: Track agent and human time separately. Dual logging at each workflow checkpoint. Both time entries link to the same task. This gives you the data to calculate blended costs and identify where human review is the bottleneck versus where agent execution is the bottleneck.

Levels 4–5: Track agent time as independent work. Full agent timesheets with every action logged. Human review time is a separate, occasional entry — not a constant companion to every agent task. This is where agent timesheets become essential rather than optional.

Start with your highest-autonomy agents. They need tracking most urgently because they operate with less human oversight and present the greatest audit risk. Once tracking is working for autonomous agents, extending it to supervised agents is straightforward — you are adding human review logging to an existing agent tracking system, not building from scratch.

Plan for migration. As agents gain autonomy over time — moving from Level 2 to Level 3, or Level 3 to Level 4 — your tracking requirements increase. A tracking system designed only for Level 2 will break when you move to Level 4. Build for the level you are heading toward, not just the level you are at today.

Keito tracks AI agents at every autonomy level — from supervised copilots to fully autonomous workers — adapting time and cost attribution as your deployment evolves.

Frequently Asked Questions

What is the difference between supervised and autonomous AI agents?

Supervised AI agents require human review and approval before their outputs are used. Autonomous agents operate independently within defined boundaries, making decisions and taking actions without per-task human oversight. The distinction sits on a five-level spectrum from suggestion-based to fully autonomous.

How do you track time for autonomous AI agents?

Log agent execution time per task with start and end timestamps, tokens consumed, and tools called. Track wall-clock time and compute time separately, since autonomous agents may run tasks in parallel. Link each entry to a client, project, and billing code for attribution.

Should supervised AI agent time be billed as human time?

In most cases, yes. Supervised agent work is typically billed as human time because the professional reviewed, approved, and takes accountability for the output. Some firms apply an efficiency discount to reflect the agent’s contribution, while others bill at full value-based rates.

What are the autonomy levels of AI agents?

Five levels describe the spectrum. Level 1: suggestion-based (agent suggests, human acts). Level 2: task-based supervised (agent completes, human reviews). Level 3: workflow-based supervised (multi-step with checkpoints). Level 4: autonomous with guardrails (independent within boundaries). Level 5: fully autonomous (no human involvement during execution).

How does agent autonomy affect billing?

Supervised agent work fits existing hourly billing models. Autonomous agent work requires new approaches — agent-time billing, per-task pricing, or value-based billing. Disclosure requirements also increase with autonomy. Clients expect transparency about the level of AI involvement in their deliverables.

What tracking data do autonomous agents need?

Autonomous agents need full activity logs: invocation data, execution timestamps, token consumption, tool calls, input and output records, cost data, error logs, and client attribution. They also need guardrail monitoring — alerts when the agent approaches spending limits, scope boundaries, or escalation triggers.

How do you handle the transition from supervised to autonomous agents?

Build your tracking system for the autonomy level you are heading toward, not just where you are today. Start with dual logging (agent time plus human review time) at supervised levels. As agents gain autonomy, the human review component shrinks and the agent timesheet becomes the primary record.