Developer-kit task-quality-kpi

Objective task quality evaluation framework using quantitative KPIs. KPIs are automatically calculated by a hook when task files are modified and saved to TASK-XXX--kpi.json. Use when: reading KPI data for task evaluation, understanding quality metrics, deciding whether to iterate or approve based on data.

install

source · Clone the upstream repo

git clone https://github.com/giuseppe-trisciuoglio/developer-kit

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/giuseppe-trisciuoglio/developer-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/developer-kit-specs/skills/task-quality-kpi" ~/.claude/skills/giuseppe-trisciuoglio-developer-kit-task-quality-kpi && rm -rf "$T"

manifest: plugins/developer-kit-specs/skills/task-quality-kpi/SKILL.md

source content

Task Quality KPI Framework

Overview

The Task Quality KPI Framework provides objective, quantitative metrics for evaluating task implementation quality.

Key Architecture: KPIs are auto-generated by a hook - you read the results, not run scripts.

┌─────────────────────────────────────────────────────────────┐
│  HOOK (auto-executes)                                       │
│  Trigger: PostToolUse on TASK-*.md                          │
│  Script: task-kpi-analyzer.py                               │
│  Output: TASK-XXX--kpi.json                                 │
├─────────────────────────────────────────────────────────────┤
│  SKILL / AGENT (reads output)                               │
│  Input: TASK-XXX--kpi.json                                  │
│  Action: Make evaluation decisions                          │
└─────────────────────────────────────────────────────────────┘

Why This Architecture?

Problem	Solution
Skills can't execute scripts	Hook auto-runs on file save
Subjective review_status	Quantitative 0-10 scores
"Looks good to me"	Evidence-based evaluation
Binary pass/fail	Graduated quality levels

KPI File Location

After any task file modification, find KPI data at:

docs/specs/[ID]/tasks/TASK-XXX--kpi.json

KPI Categories

┌─────────────────────────────────────────────────────────────┐
│                    OVERALL SCORE (0-10)                     │
├─────────────────────────────────────────────────────────────┤
│  Spec Compliance (30%)                                      │
│  ├── Acceptance Criteria Met (0-10)                         │
│  ├── Requirements Coverage (0-10)                           │
│  └── No Scope Creep (0-10)                                  │
├─────────────────────────────────────────────────────────────┤
│  Code Quality (25%)                                         │
│  ├── Static Analysis (0-10)                                 │
│  ├── Complexity (0-10)                                      │
│  └── Patterns Alignment (0-10)                              │
├─────────────────────────────────────────────────────────────┤
│  Test Coverage (25%)                                        │
│  ├── Unit Tests Present (0-10)                              │
│  ├── Test/Code Ratio (0-10)                                 │
│  └── Coverage Percentage (0-10)                             │
├─────────────────────────────────────────────────────────────┤
│  Contract Fulfillment (20%)                                 │
│  ├── Provides Verified (0-10)                               │
│  └── Expects Satisfied (0-10)                               │
└─────────────────────────────────────────────────────────────┘

Category Weights

Category	Weight	Why
Spec Compliance	30%	Most important - did we build what was asked?
Code Quality	25%	Technical excellence
Test Coverage	25%	Verification and confidence
Contract Fulfillment	20%	Integration with other tasks

When to Use

Reading KPI data for task quality evaluation
Understanding quality metrics and scoring breakdown
Deciding whether to iterate or approve based on quantitative data
Integrating KPI checks into automated loops (
```
agents_loop.py
```
)
Generating evidence-based evaluation reports

Instructions

1. Reading KPI Data (Primary Use)

DO NOT run scripts - read the auto-generated file:

Read the KPI file:
  docs/specs/001-feature/tasks/TASK-001--kpi.json

2. Understanding the Data

The KPI file contains:

{
  "task_id": "TASK-001",
  "evaluated_at": "2026-01-15T10:30:00Z",
  "overall_score": 8.2,
  "passed_threshold": true,
  "threshold": 7.5,
  "kpi_scores": [
    {
      "category": "Spec Compliance",
      "weight": 30,
      "score": 8.5,
      "weighted_score": 2.55,
      "metrics": {
        "acceptance_criteria_met": 9.0,
        "requirements_coverage": 8.0,
        "no_scope_creep": 8.5
      },
      "evidence": [
        "Acceptance criteria: 9/10 checked",
        "Requirements coverage: 8/10"
      ]
    }
  ],
  "recommendations": [
    "Code Quality: Moderate improvements possible"
  ],
  "summary": "Score: 8.2/10 - PASSED"
}

3. Making Decisions

Use

overall_score

and

passed_threshold

IF passed_threshold == true:
  → Task meets quality standards
  → Approve and proceed

IF passed_threshold == false:
  → Task needs improvement
  → Check recommendations for specific targets
  → Create fix specification

Integration with Workflow

In Task Review (evaluator-agent)

## Review Process

1. Read KPI file: TASK-XXX--kpi.json
2. Extract overall_score and kpi_scores
3. Read task file to validate
4. Generate evaluation report
5. Decision based on passed_threshold

In agents_loop

# Check KPI file exists
kpi_path = spec_path / "tasks" / f"{task_id}--kpi.json"

if kpi_path.exists():
    kpi_data = json.loads(kpi_path.read_text())
    
    if kpi_data["passed_threshold"]:
        # Quality threshold met
        advance_state("update_done")
    else:
        # Need more work
        fix_targets = kpi_data["recommendations"]
        create_fix_task(fix_targets)
        advance_state("fix")
else:
    # KPI not generated yet - task may not be implemented
    log_warning("No KPI data found")

Multi-Iteration Loop

Instead of max 3 retries, iterate until quality threshold met:

Iteration 1: Score 6.2 → FAILED → Fix: Improve test coverage
Iteration 2: Score 7.1 → FAILED → Fix: Refactor complex functions  
Iteration 3: Score 7.8 → PASSED → Proceed

Each iteration updates the KPI file automatically on task save.

Threshold Guidelines

Score	Quality Level	Action
9.0-10.0	Exceptional	Approve, document best practices
8.0-8.9	Good	Approve with minor notes
7.0-7.9	Acceptable	Approve (if threshold 7.5)
6.0-6.9	Below Standard	Request specific improvements
< 6.0	Poor	Significant rework required

Recommended Thresholds

Project Type	Threshold	Rationale
Production MVP	8.0	High quality required
Internal Tool	7.0	Good enough
Prototype	6.0	Functional over perfect
Critical System	8.5	No compromises

Metric Details

Spec Compliance Metrics

Acceptance Criteria Met

Calculates:

(checked_criteria / total_criteria) * 10

Source: Task file checkbox count
Example: 9/10 checked = 9.0

Requirements Coverage

Calculates: Count of REQ-IDs this task covers
Source:
```
traceability-matrix.md
```
Example: 4 requirements covered = 8.0

No Scope Creep

Calculates:

(implemented_files / expected_files) * 10

Source: Task "Files to Create" vs actual files
Penalizes: Missing files or unexpected additions

Code Quality Metrics

Static Analysis

Java: Maven Checkstyle
TypeScript: ESLint
Python: ruff
Score: 10 if passes, 5 if issues found

Complexity

Calculates: Functions >50 lines
Score:
```
10 - (long_functions_ratio * 5)
```
Penalizes: Large, complex functions

Patterns Alignment

Checks: Knowledge Graph patterns
Source:
```
knowledge-graph.json
```
Validates: Implementation follows project patterns

Test Coverage Metrics

Unit Tests Present

Calculates:
```
min(10, test_files * 5)
```
2 test files = maximum score
Penalizes: Missing tests

Test/Code Ratio

Calculates:
```
(test_count / code_count) * 10
```
1:1 ratio = 10/10
Ideal: At least 1 test file per code file

Coverage Percentage

Source: Coverage reports (JaCoCo, lcov, etc.)
Calculates:
```
coverage_percent / 10
```
80% coverage = 8.0

Contract Fulfillment Metrics

Provides Verified

Checks: Files exist and export expected symbols
Source: Task
```
provides
```
frontmatter
Validates: Contract satisfied

Expects Satisfied

Checks: Dependencies provide required files/symbols
Source: Task
```
expects
```
frontmatter
Validates: Prerequisites met

When KPI File is Missing

TASK-XXX--kpi.json

doesn't exist:

Task was never modified - Hook runs on file save
Hook failed - Check Claude Code logs
Task is new - Save the file first to trigger hook

DO NOT try to calculate KPIs manually. The hook runs automatically when:

Task file is saved (Write tool)
Task file is edited (Edit tool)

Best Practices

1. Always Check KPI File Exists

Before evaluating:

Check if KPI file exists:
  docs/specs/[ID]/tasks/TASK-XXX--kpi.json

If missing:
  - Task may not be implemented yet
  - Ask user to save the task file first

2. Trust the Metrics

The KPIs are objective. Only override with documented evidence:

Critical security issue not in metrics
Logic error not caught by static analysis
Exceptional quality not measured

3. Iterate on Low KPIs

Target specific categories:

❌ "Fix code quality issues"
✅ "Improve Code Quality KPI from 5.2 to 7.0:
    - Complexity: Refactor processData() (5→8)
    - Patterns: Add error handling (6→8)"

4. Track KPI Trends

Monitor quality over time:

Sprint 1: Average KPI 6.8
Sprint 2: Average KPI 7.3 (+0.5)
Sprint 3: Average KPI 7.9 (+0.6)

Troubleshooting

KPI File Not Generated

Check:

Hook enabled in
```
hooks.json
```
Task file name matches pattern
```
TASK-*.md
```
File was actually saved (not just viewed)

KPI Scores Seem Wrong

Validate:

Check evidence field for data sources
Verify files exist at expected paths
Some metrics need build tools (Maven, npm)

Low Scores Despite Good Code

Possible causes:

Missing test files
No coverage report generated
Acceptance criteria not checked
Lint rules too strict

Fix the root cause, not just the score.

Examples

Example 1: Reading KPI Data

Read the KPI file to evaluate task quality:
  docs/specs/001-feature/tasks/TASK-042--kpi.json

Based on the data:
- Overall score: 6.8/10 (below threshold)
- Lowest KPI: Test Coverage (5.0/10)
- Recommendation: Add unit tests

Decision: REQUEST FIXES - target Test Coverage improvement

Example 2: Iteration Decision

Iteration 1 KPI: Score 6.2 → FAILED
- Spec Compliance: 7.0 ✓
- Code Quality: 5.5 ✗
- Test Coverage: 6.0 ✗

Fix targets:
1. Refactor complex functions (Code Quality)
2. Add test coverage (Test Coverage)

Iteration 2 KPI: Score 7.8 → PASSED ✓

Example 3: agents_loop Integration

# In agents_loop, after implementation step
kpi_file = spec_dir / "tasks" / f"{task_id}--kpi.json"

if kpi_file.exists():
    kpi = json.loads(kpi_file.read_text())
    
    if kpi["passed_threshold"]:
        print(f"✅ Task passed quality check: {kpi['overall_score']}/10")
        advance_state("update_done")
    else:
        print(f"❌ Task failed quality check: {kpi['overall_score']}/10")
        print("Recommendations:")
        for rec in kpi["recommendations"]:
            print(f"  - {rec}")
        advance_state("fix")

References

```
evaluator-agent.md
```
- Agent that uses KPI data for evaluation
```
hooks.json
```
- Hook configuration for auto-generation
```
task-kpi-analyzer.py
```
- Hook script (do not execute directly)
```
agents_loop.py
```
- Orchestrator that reads KPI for decisions