How to Track Developer Productivity Without Surveillance or Micromanagement

To track developer productivity without surveillance, measure outputs (PRs merged, deploys shipped, incidents resolved) at the team level rather than watching individuals. Use DORA and SPACE metrics sourced from git, CI, and incident tools — and give developers visibility into their own data.

Your engineers have started commenting their screens before taking a bathroom break. Activity lights are green all day, keystroke counts are up, and yet the last three releases slipped. That is not a productivity problem. That is a measurement problem caused by watching the wrong signals. This guide covers why keystroke and screenshot tools destroy trust without producing output, which output-based signals actually correlate with shipping software, and how to roll out a privacy-respecting framework that your team helps design rather than resents.

Why does surveillance-based tracking backfire for engineering teams?

Surveillance tools measure presence, not outcomes. A keystroke count rewards typing, not thinking. A screenshot every ten minutes rewards keeping an IDE open, not solving the problem behind the ticket. The deeper issue is that the signal they capture has almost nothing to do with how software actually gets built.

Research published in 2024 across US workplaces found that 41% of monitored professionals said surveillance made them less productive, with stress levels jumping from 28% in low-monitoring environments to 45% in heavily monitored ones. A separate Gartner survey cited in 2024 reporting found that 46% of tech workers said they would quit if their employer began tracking keystrokes or taking screenshots. When your most portable workers respond to a tool by updating their CV, the tool is costing you more than it saves.

Goodhart’s Law, applied to your git history

There is a name for what happens when you turn a metric into a target: Goodhart’s Law. Economist Charles Goodhart captured it in one sentence — when a measure becomes a target, it ceases to be a good measure. Software engineering is unusually vulnerable to this effect because almost every output signal can be gamed cheaply.

Measure lines of code and you get bloated pull requests full of copy-paste. Measure story points completed and developers quietly inflate the estimates until velocity looks exceptional while delivery stays flat. Measure commits per day and you get five commits where one would have done. None of the gaming requires malice. People respond to the incentives the system creates.

The legal exposure nobody costed into the procurement

Screenshot and keystroke tools sit awkwardly under UK GDPR and the EU equivalent. Any processing of personal data at this intensity triggers a requirement to run a Data Protection Impact Assessment, show a lawful basis, and apply data minimisation. Emerging state-level rules in the US (New York’s electronic monitoring notice law is one current example) add further notification duties. A tool that cannot justify itself against these tests is not a productivity measure — it is a liability waiting to be written up.

Which output-based metrics actually measure developer productivity?

The honest answer: no single metric does. The research consensus across the last decade points to a small number of outcome signals that, taken together, reflect real engineering effectiveness. Two frameworks dominate the literature — and they are complementary rather than competing.

DORA: four signals that track delivery performance

Google’s DevOps Research and Assessment group published four core metrics after multi-year industry research:

Deployment frequency — how often code reaches production
Lead time for changes — first commit to production deploy
Change failure rate — the share of deploys that cause an incident
Time to restore service — how quickly you recover when things break

These four are cheap to collect from git providers, CI pipelines, and incident management tools. They are also resistant to the worst forms of gaming because the two speed metrics pull in opposite directions from the two stability metrics. You cannot ship faster by shipping worse code without the failure rate giving you away.

SPACE: five dimensions that keep DORA honest

The SPACE framework, developed by researchers at a major code-hosting company and a research lab alongside Dr. Nicole Forsgren (one of the original DORA authors), covers five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow. SPACE explicitly warns against using a single metric or judging individuals on any of these signals alone.

The point of pairing DORA with SPACE is simple. DORA tells you what the delivery system is producing. SPACE tells you why — are we burning people out, are handoffs clean, are reviews timely? A high change failure rate plus a dip in satisfaction scores is a different problem from a high failure rate plus thriving engineers, and the fix differs accordingly.

The metrics we would avoid

Three signals look tempting and cause more harm than they prevent. Lines of code penalises careful engineers. Individual PR counts punishes reviewers and ignores pairing. Hours at keyboard is pure theatre — any serious engineer will tell you the breakthrough on a nasty bug often comes on a walk, not on a screen.

How do you build a non-invasive productivity framework?

Here is the rollout we use with teams moving off surveillance tooling. Each step is cheap, reversible, and designed so developers are collaborators rather than subjects.

Step 1: Agree three to five team-level metrics

Gather engineering leadership and two or three working engineers. Pick three to five signals — typically two from DORA, one qualitative (developer satisfaction), and one that reflects your specific context (incident response load, on-call fairness, or PR review turnaround). Make it explicit that these are team measurements, not individual scorecards.

Step 2: Source metrics from tools you already pay for

Every signal on the list above can be pulled from systems already in place. Git providers expose PR and review timestamps. CI tools record deploy events. Incident tools track mean time to restore. Calendar tools show meeting load. No new data capture agent is needed — and that is the point.

Step 3: Build a shared dashboard the team owns

The dashboard should be visible to everyone, not locked behind manager-only access. When engineers see the same numbers their managers see, conversations shift from defence to diagnosis. Review the dashboard in retrospectives, not in one-to-ones.

Step 4: Track trends, not absolutes

A deployment frequency of 3.2 per week means nothing in isolation. A drift from 5 per week to 2 per week over a quarter means a lot. Good frameworks care about direction and variance, not league-table positions.

Step 5: Pair numbers with developer experience surveys

Run a short quarterly survey — five to seven questions — on flow, tooling friction, meeting load, and confidence in releases. The data complements DORA beautifully and is the cheapest signal of trouble you will ever buy. For teams that already map git commits to billable time, the same data pipe can surface cycle-time trends without a second integration.

How does activity-based time tracking differ from surveillance?

This is the distinction that trips most conversations up. Activity-based time tracking and surveillance look superficially similar — both involve software capturing signals about work — but the design choices are opposite, and the difference is visible in the data they collect.

Surveillance captures content. Screenshots of your screen. Keystrokes you typed. Which application was active every minute. The unit of analysis is the person, and the data is held for managers to judge.

Activity-based tracking captures events. A commit was pushed at 14:32 to the billing-refactor branch. A PR was reviewed at 15:05. A meeting appeared on the calendar from 11:00 to 11:30. The unit of analysis is the work, not the worker. Screen content never leaves the device because it is never read.

The practical tell is simple: look at who sees the data first. In a surveillance system, managers review developers. In a well-designed activity-based timesheet, the developer sees and approves their own draft before anyone else sees it. That one reversal of access changes how the tool feels in daily use. It also aligns the system with UK GDPR’s fair processing and transparency principles rather than stretching them.

This is exactly how the Keito GitHub Action works in activity mode: PR merges, reviews, and pushes become draft time entries for the developer to review — no screen capture, no keystrokes, just the events GitHub already records.

What developers should be able to see

A fair system gives every developer:

The full list of signals collected about them
A visible audit trail showing who has accessed their data
The right to edit auto-generated entries before approval
A clear retention policy — and a delete button

If the tool on your desk cannot answer those four questions, it is not a productivity tool. It is a surveillance tool with better marketing. Teams that need to reconstruct timesheets from git history after a missed week get all four by default — because git was already there, and the developer is the one approving the draft.

How do you convince leadership to replace surveillance tools?

Leadership is not the enemy in this conversation. They have a real question they are trying to answer — is the team delivering, where are the bottlenecks — and the surveillance tool is a bad answer to a fair question. Your job is to bring a better answer.

Three numbers usually win the argument:

Attrition cost. Replacing a mid-level engineer costs 100-200% of annual salary once you add recruiting, onboarding, and the months before full productivity. If surveillance moves even 10% of your team to “actively looking”, the tool has already cost more than any licence fee.
Measurable delivery. Bring a draft dashboard of DORA metrics for the last quarter. Show lead time, deployment frequency, and change failure rate as trendlines. Leaders stop asking about keystroke counts the moment they can see deploys per week.
Audit-readiness. A framework built on existing tool data is documentable for clients, auditors, and regulators. A screen-capture system is a conversation nobody wants to have in a security review.

The offer is straightforward: same question answered, better data, lower attrition, less legal exposure. That tends to carry the room.

Key Takeaway

Surveillance measures presence, not output, and actively drives engineers out. DORA and SPACE metrics sourced from tools you already own give a truer view of productivity — and respect the team.

Frequently Asked Questions

Can you measure individual developer productivity without surveillance?

You can track individual contributions — PRs merged, reviews completed, incidents resolved — from git and CI data, but use them as inputs to one-to-one conversations, not as individual scorecards. Team-level metrics carry most of the signal about delivery effectiveness. Individual scoring consistently triggers Goodhart’s Law and drives gaming behaviour.

Do DORA metrics work for small engineering teams?

Yes. DORA scales down to teams of three to five developers without modification. For very small teams, concentrate on deployment frequency and lead time for changes — they are the most actionable signals at small scale. Change failure rate and time to restore become useful once you are deploying weekly or more.

How do I convince leadership to drop surveillance tools?

Present three numbers: the attrition cost of keeping the tool, a sample DORA dashboard that answers the same question better, and the compliance exposure of continued screen and keystroke capture. Leadership tends to care about delivery and risk. Surveillance tooling fails both tests once a credible alternative is on the table.

Yes, when designed correctly. Activity-based tracking for billing and capacity planning typically relies on the legitimate interest lawful basis, supported by a Data Protection Impact Assessment, clear notice to workers, data minimisation, and a right of access and correction. Screen-capture and keystroke monitoring face a much harder justification test and often fail it.

What is the SPACE framework and how does it fit with DORA?

The SPACE framework covers Satisfaction, Performance, Activity, Communication, and Efficiency — five dimensions of engineering effectiveness developed by Dr. Nicole Forsgren and colleagues. DORA tells you what the delivery system is producing; SPACE tells you why. Using them together resists single-metric gaming and gives leaders a balanced view.

Can you track time without tracking individuals?

Yes. Aggregate metrics — team cycle time, deployment frequency, average review turnaround — answer most delivery questions without naming anyone. Where individual data is needed (for billing, for example), an activity-based timesheet the developer reviews before submission keeps the person in control of their own record.

How often should productivity metrics be reviewed?

Review DORA trendlines monthly and run an experience survey quarterly. Anything shorter invites noise-chasing; anything longer misses regressions. Pair the review with retrospectives so the numbers feed a conversation, not a judgement.

Ready to measure productivity the right way?

Track engineering output from git, PRs, reviews, and meetings — without screenshots, keystrokes, or surveillance. Developers stay in control of their own data from the first commit.

See How It Works