Developer-kit task-quality-kpi
Objective task quality evaluation framework using quantitative KPIs. KPIs are automatically calculated by a hook when task files are modified and saved to TASK-XXX--kpi.json. Use when: reading KPI data for task evaluation, understanding quality metrics, deciding whether to iterate or approve based on data.
git clone https://github.com/giuseppe-trisciuoglio/developer-kit
T=$(mktemp -d) && git clone --depth=1 https://github.com/giuseppe-trisciuoglio/developer-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/developer-kit-specs/skills/task-quality-kpi" ~/.claude/skills/giuseppe-trisciuoglio-developer-kit-task-quality-kpi && rm -rf "$T"
plugins/developer-kit-specs/skills/task-quality-kpi/SKILL.mdTask Quality KPI Framework
Overview
The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.
Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.
┌─────────────────────────────────────────────────────────────┐ │ HOOK (auto-executes) │ │ Trigger: PostToolUse on TASK-*.md │ │ Script: task-kpi-analyzer.py │ │ Output: TASK-XXX--kpi.json │ ├─────────────────────────────────────────────────────────────┤ │ SKILL / AGENT (reads output) │ │ Input: TASK-XXX--kpi.json │ │ Action: Make evaluation decisions │ └─────────────────────────────────────────────────────────────┘
Why This Architecture?
| Problem | Solution |
|---|---|
| Skills can't execute scripts | Hook auto-runs on file save |
| Subjective review_status | Quantitative 0-10 scores |
| "Looks good to me" | Evidence-based evaluation |
| Binary pass/fail | Graduated quality levels |
KPI File Location
After any task file modification, find KPI data at:
docs/specs/[ID]/tasks/TASK-XXX--kpi.json
KPI Categories
┌─────────────────────────────────────────────────────────────┐ │ OVERALL SCORE (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Spec Compliance (30%) │ │ ├── Acceptance Criteria Met (0-10) │ │ ├── Requirements Coverage (0-10) │ │ └── No Scope Creep (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Code Quality (25%) │ │ ├── Static Analysis (0-10) │ │ ├── Complexity (0-10) │ │ └── Patterns Alignment (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Test Coverage (25%) │ │ ├── Unit Tests Present (0-10) │ │ ├── Test/Code Ratio (0-10) │ │ └── Coverage Percentage (0-10) │ ├─────────────────────────────────────────────────────────────┤ │ Contract Fulfillment (20%) │ │ ├── Provides Verified (0-10) │ │ └── Expects Satisfied (0-10) │ └─────────────────────────────────────────────────────────────┘
Category Weights
| Category | Weight | Why |
|---|---|---|
| Spec Compliance | 30% | Most important - did we build what was asked? |
| Code Quality | 25% | Technical excellence |
| Test Coverage | 25% | Verification and confidence |
| Contract Fulfillment | 20% | Integration with other tasks |
When to Use
- Reading KPI data for task quality evaluation
- Understanding quality metrics and scoring breakdown
- Deciding whether to iterate or approve based on quantitative data
- Integrating KPI checks into automated loops (
)agents_loop.py - Generating evidence-based evaluation reports
Instructions
1. Reading KPI Data (Primary Use)
DO NOT run scripts - read the auto-generated file:
Read the KPI file: docs/specs/001-feature/tasks/TASK-001--kpi.json
2. Understanding the Data
The KPI file contains:
{ "task_id": "TASK-001", "evaluated_at": "2026-01-15T10:30:00Z", "overall_score": 8.2, "passed_threshold": true, "threshold": 7.5, "kpi_scores": [ { "category": "Spec Compliance", "weight": 30, "score": 8.5, "weighted_score": 2.55, "metrics": { "acceptance_criteria_met": 9.0, "requirements_coverage": 8.0, "no_scope_creep": 8.5 }, "evidence": [ "Acceptance criteria: 9/10 checked", "Requirements coverage: 8/10" ] } ], "recommendations": [ "Code Quality: Moderate improvements possible" ], "summary": "Score: 8.2/10 - PASSED" }
3. Making Decisions
Use
overall_score and passed_threshold:
IF passed_threshold == true: → Task meets quality standards → Approve and proceed IF passed_threshold == false: → Task needs improvement → Check recommendations for specific targets → Create fix specification
Integration with Workflow
In Task Review (evaluator-agent)
## Review Process 1. Read KPI file: TASK-XXX--kpi.json 2. Extract overall_score and kpi_scores 3. Read task file to validate 4. Generate evaluation report 5. Decision based on passed_threshold
In agents_loop
# Check KPI file exists kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json" if kpi_path.exists(): kpi_data = json.loads(kpi_path.read_text()) if kpi_data["passed_threshold"]: # Quality threshold met advance_state("update_done") else: # Need more work fix_targets = kpi_data["recommendations"] create_fix_task(fix_targets) advance_state("fix") else: # KPI not generated yet - task may not be implemented log_warning("No KPI data found")
Multi-Iteration Loop
Instead of max 3 retries, iterate until quality threshold met:
Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions Iteration 3: Score 7.8 → PASSED → Proceed
Each iteration updates the KPI file automatically on task save.
Threshold Guidelines
| Score | Quality Level | Action |
|---|---|---|
| 9.0-10.0 | Exceptional | Approve, document best practices |
| 8.0-8.9 | Good | Approve with minor notes |
| 7.0-7.9 | Acceptable | Approve (if threshold 7.5) |
| 6.0-6.9 | Below Standard | Request specific improvements |
| < 6.0 | Poor | Significant rework required |
Recommended Thresholds
| Project Type | Threshold | Rationale |
|---|---|---|
| Production MVP | 8.0 | High quality required |
| Internal Tool | 7.0 | Good enough |
| Prototype | 6.0 | Functional over perfect |
| Critical System | 8.5 | No compromises |
Metric Details
Spec Compliance Metrics
Acceptance Criteria Met
- Calculates:
(checked_criteria / total_criteria) * 10 - Source: Task file checkbox count
- Example: 9/10 checked = 9.0
Requirements Coverage
- Calculates: Count of REQ-IDs this task covers
- Source:
traceability-matrix.md - Example: 4 requirements covered = 8.0
No Scope Creep
- Calculates:
(implemented_files / expected_files) * 10 - Source: Task "Files to Create" vs actual files
- Penalizes: Missing files or unexpected additions
Code Quality Metrics
Static Analysis
- Java: Maven Checkstyle
- TypeScript: ESLint
- Python: ruff
- Score: 10 if passes, 5 if issues found
Complexity
- Calculates: Functions >50 lines
- Score:
10 - (long_functions_ratio * 5) - Penalizes: Large, complex functions
Patterns Alignment
- Checks: Knowledge Graph patterns
- Source:
knowledge-graph.json - Validates: Implementation follows project patterns
Test Coverage Metrics
Unit Tests Present
- Calculates:
min(10, test_files * 5) - 2 test files = maximum score
- Penalizes: Missing tests
Test/Code Ratio
- Calculates:
(test_count / code_count) * 10 - 1:1 ratio = 10/10
- Ideal: At least 1 test file per code file
Coverage Percentage
- Source: Coverage reports (JaCoCo, lcov, etc.)
- Calculates:
coverage_percent / 10 - 80% coverage = 8.0
Contract Fulfillment Metrics
Provides Verified
- Checks: Files exist and export expected symbols
- Source: Task
frontmatterprovides - Validates: Contract satisfied
Expects Satisfied
- Checks: Dependencies provide required files/symbols
- Source: Task
frontmatterexpects - Validates: Prerequisites met
When KPI File is Missing
If
TASK-XXX--kpi.json doesn't exist:
- Task was never modified - Hook runs on file save
- Hook failed - Check Claude Code logs
- Task is new - Save the file first to trigger hook
DO NOT try to calculate KPIs manually. The hook runs automatically when:
- Task file is saved (Write tool)
- Task file is edited (Edit tool)
Best Practices
1. Always Check KPI File Exists
Before evaluating:
Check if KPI file exists: docs/specs/[ID]/tasks/TASK-XXX--kpi.json If missing: - Task may not be implemented yet - Ask user to save the task file first
2. Trust the Metrics
The KPIs are objective. Only override with documented evidence:
- Critical security issue not in metrics
- Logic error not caught by static analysis
- Exceptional quality not measured
3. Iterate on Low KPIs
Target specific categories:
❌ "Fix code quality issues" ✅ "Improve Code Quality KPI from 5.2 to 7.0: - Complexity: Refactor processData() (5→8) - Patterns: Add error handling (6→8)"
4. Track KPI Trends
Monitor quality over time:
Sprint 1: Average KPI 6.8 Sprint 2: Average KPI 7.3 (+0.5) Sprint 3: Average KPI 7.9 (+0.6)
Troubleshooting
KPI File Not Generated
Check:
- Hook enabled in
hooks.json - Task file name matches pattern
TASK-*.md - File was actually saved (not just viewed)
KPI Scores Seem Wrong
Validate:
- Check evidence field for data sources
- Verify files exist at expected paths
- Some metrics need build tools (Maven, npm)
Low Scores Despite Good Code
Possible causes:
- Missing test files
- No coverage report generated
- Acceptance criteria not checked
- Lint rules too strict
Fix the root cause, not just the score.
Examples
Example 1: Reading KPI Data
Read the KPI file to evaluate task quality: docs/specs/001-feature/tasks/TASK-042--kpi.json Based on the data: - Overall score: 6.8/10 (below threshold) - Lowest KPI: Test Coverage (5.0/10) - Recommendation: Add unit tests Decision: REQUEST FIXES - target Test Coverage improvement
Example 2: Iteration Decision
Iteration 1 KPI: Score 6.2 → FAILED - Spec Compliance: 7.0 ✓ - Code Quality: 5.5 ✗ - Test Coverage: 6.0 ✗ Fix targets: 1. Refactor complex functions (Code Quality) 2. Add test coverage (Test Coverage) Iteration 2 KPI: Score 7.8 → PASSED ✓
Example 3: agents_loop Integration
# In agents_loop, after implementation step kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json" if kpi_file.exists(): kpi = json.loads(kpi_file.read_text()) if kpi["passed_threshold"]: print(f"✅ Task passed quality check: {kpi['overall_score']}/10") advance_state("update_done") else: print(f"❌ Task failed quality check: {kpi['overall_score']}/10") print("Recommendations:") for rec in kpi["recommendations"]: print(f" - {rec}") advance_state("fix")
References
- Agent that uses KPI data for evaluationevaluator-agent.md
- Hook configuration for auto-generationhooks.json
- Hook script (do not execute directly)task-kpi-analyzer.py
- Orchestrator that reads KPI for decisionsagents_loop.py