Beyond productivity metrics
Most team metrics measure activity: tasks created, tasks completed, velocity, throughput, story points burned. These tell you how busy people are, not how effectively the organization executes.
There's a fundamental gap in how organizations measure operational health. Velocity tells you how fast work moves. Throughput tells you how much work completes. But neither tells you how much management overhead was required to achieve those numbers.
A team with high velocity but constant manager intervention is fragile. Remove the manager, and velocity collapses. A team with moderate velocity but zero manager intervention is antifragile — it can sustain and improve itself.
Autonomy Score measures this: what percentage of tasks complete without manager intervention?
How it's calculated
The score tracks the full lifecycle of each task and identifies intervention points:
What counts as an intervention?
| Intervention Type | Description | Weight |
|---|---|---|
| Manual reassignment | Manager had to move a task to a different person | High |
| Manual follow-up | Manager directly messaged an employee for a status update | Medium |
| Blocker resolution | Manager personally resolved a dependency or escalation | High |
| Proof rejection | Manager reviewed and rejected submitted proof | Medium |
| Scope change | Manager redefined acceptance criteria after assignment | Low |
| Deadline extension | Manager extended a deadline due to execution issues | Low |
Tasks that flow from assignment to verified completion without any of these touchpoints count as "autonomous." The score is the percentage of autonomous completions over a rolling 30-day period.
The formula
```
Autonomy Score = (Tasks completed without intervention / Total tasks completed) × 100
```
Importantly, the score excludes tasks that are still in progress — it only measures completed work. This prevents gaming (assigning easy tasks to inflate the number).
Weighted vs. unweighted
The raw score treats all interventions equally. But a quick Slack clarification and a full task reassignment aren't the same level of failure. The weighted score applies the intervention weights above:
- A task with one low-weight intervention (scope clarification) might score 0.8 instead of 0 or 1
- A task requiring full manager takeover scores 0
The weighted score provides a more nuanced picture and is the default in Mnage dashboards.
The typical journey
Based on data from 50+ beta teams, here's the progression pattern:
Week 1-2: 30-40%
The AI is learning. It's observing communication patterns, response times, work styles, and blocker patterns. Follow-ups are still somewhat generic because the personalization model hasn't converged yet. Most tasks need some manager input.
This is the calibration phase. The AI is building profiles:
- When does each person respond to messages?
- What tone yields the highest engagement?
- Which tasks typically need clarification?
- Where do blockers usually occur?
Week 3-4: 55-65%
Follow-ups are personalized. The AI knows when to ping someone, how to phrase it, and when to escalate. Proof validation is catching quality issues at submission time instead of in review meetings. Managers start noticing fewer fires.
Key indicators at this stage:
- Follow-up response rate exceeds 80%
- Blockers are detected an average of 3 days earlier than before
- Manager follow-up messages drop by 60%
Week 5-6: 70-80%
The system is humming. Blockers are detected and resolved before managers even know about them. Proof validation has trained the team to submit complete work on the first attempt (because the AI will ask for more evidence if they don't). Daily briefings replace status meetings.
Behavioral changes you'll observe:
- Employees proactively include proof when marking tasks complete
- Cross-team dependencies are resolved without escalation
- Managers check the dashboard once per day instead of Slack-stalking threads
Week 7+: 80%+
This is the target state. Managers review verified outcomes once a day. They spend their time on strategy, not coordination. The AI handles the execution layer. New goals go from "created" to "decomposed into tasks with criteria" to "assigned" to "completed and verified" with minimal human coordination.
Why Autonomy Score matters more than velocity
It predicts goal completion
In our data, Autonomy Score is the strongest predictor of quarterly goal completion — stronger than velocity, throughput, or team size.
| Autonomy Score Range | Goal Completion Rate | Avg. Overdue Tasks |
|---|---|---|
| Below 40% | 31% | 12 per sprint |
| 40-60% | 52% | 7 per sprint |
| 60-80% | 74% | 3 per sprint |
| Above 80% | 91% | <1 per sprint |
Teams above 75% autonomy complete 2.4x more goals per quarter than teams below 50%.
It saves manager time
Every 10-point increase in Autonomy Score correlates with ~3 hours/week of saved manager time. At 80%+, managers report spending less than 2 hours per week on coordination — down from the 15-hour average.
It improves employee satisfaction
People prefer autonomy over micromanagement. Research from the University of Birmingham found that employees with higher autonomy report 20% greater job satisfaction and 15% higher performance ratings.
Autonomy Score quantifies this at the team level. When the score is high, employees are working independently, receiving relevant (not nagging) check-ins, and closing tasks against clear criteria. The work feels purposeful rather than bureaucratic.
What drives Autonomy Score up?
Based on our data, five factors have the highest impact:
- Clear acceptance criteria (+12 points average): Tasks with specific, measurable criteria have dramatically higher autonomous completion rates. Vague tasks almost always require manager intervention.
- Consistent follow-up patterns (+8 points average): Regular, personalized AI follow-ups keep work on track without manager involvement. The AI builds rhythm.
- Fast blocker resolution (+7 points average): When blockers are detected and resolved within hours instead of days, downstream tasks don't need manual rescue.
- Proof validation calibration (+6 points average): As the AI learns what "good enough" looks like for your team, fewer proof submissions get flagged for manual review.
- Integration quality (+5 points average): When task status syncs seamlessly between Mnage and your PM tool, managers don't need to cross-reference systems.
What causes Autonomy Score to plateau?
If the score stops improving, it usually points to one of these structural issues:
- Unclear ownership: Tasks without a clear, single owner require intervention to determine who's responsible
- Missing integrations: If the AI can't access a team's actual work outputs (design files, code repos, analytics), it can't validate proof automatically
- Bottleneck roles: Some team members are dependencies for many tasks. When they're overloaded, everything downstream requires escalation
- Cultural resistance: If employees view AI follow-ups as surveillance rather than support, they respond less, which triggers more manager involvement
The meta-insight
The most powerful thing about Autonomy Score is what it reveals about your organization. It turns an abstract question — "are we getting better at execution?" — into a concrete, trackable number. And that's the first step to actually improving it.
Organizations that track Autonomy Score consistently report that it becomes a leading indicator for strategic health. When the score is rising, the organization is building execution muscle. When it plateaus or drops, there's a structural issue that needs attention — and the drill-down data tells you exactly where.
Key takeaways
- Autonomy Score measures the % of tasks that complete without manager intervention
- It's calculated from 6 types of interventions, weighted by severity
- Teams progress from ~35% to 80%+ in 6 weeks as the AI learns and optimizes
- It's the strongest predictor of goal completion — stronger than velocity or throughput
- Every 10-point increase saves ~3 hours/week of manager time
- Five factors drive improvement: clear criteria, consistent follow-ups, fast blocker resolution, proof calibration, and integration quality