Leos_claude_starter eval-sprint
Adversarial evaluation of sprint spec before implementation.
git clone https://github.com/leogodin217/leos_claude_starter
T=$(mktemp -d) && git clone --depth=1 https://github.com/leogodin217/leos_claude_starter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/eval-sprint" ~/.claude/skills/leogodin217-leos-claude-starter-eval-sprint && rm -rf "$T"
.claude/skills/eval-sprint/SKILL.mdEvaluate Sprint Spec
Adversarial evaluation of sprint spec before implementation. Run in a new session to ensure fresh eyes.
Purpose
Find problems in the spec that would cause implementation to fail or produce poor results. The evaluator is deliberately adversarial — looking for ways the spec could be misinterpreted, is incomplete, or violates principles.
Context Loading
Load ONLY:
— Principles (especially #7 and #8)CLAUDE.md
— The spec to evaluatedocs/sprints/current/spec.md
DO NOT load:
- Architecture docs (spec should be self-contained)
- Capability docs (evaluating spec as written)
- Previous conversation context (that's why new session)
- Source files via Read — Do NOT read
files to understand the codebase. Use LSP tools instead (see below)..py
Code Verification — LSP Only
When verifying spec claims against the codebase (class names, function signatures, caller counts, line numbers), use only LSP tools. Do NOT read entire source files — this wastes context on code irrelevant to the evaluation.
| Verification need | LSP tool | Example |
|---|---|---|
| "Does this class/function exist?" | | Spec says — verify the actual class name |
| "Who calls this function?" | | Spec says "only two callers" — verify caller count |
| "What's the signature?" | | Spec says param at line 203 — verify |
| "What type is this?" | | Spec references union — verify it exists |
| "Does this symbol exist in the module?" | | Spec says export from — verify |
NEVER use
Read on source files for this skill. If you catch yourself about to read a .py file, use an LSP tool instead.
Evaluation Checklist
Structural Checks
- Every phase has: Delivers, Demo, Contracts sections
- Every contract has: full signature with types
- Every contract has: docstring with Args, Returns, Raises
- Success criteria are checkboxes (testable)
- Scope lists what's NOT included
- File structure section shows where code goes
Principle Checks
- No default parameters in any signature (Principle #7)
- No
,= None
,= []
in signatures (Principle #7)= {} - No "Future:", "TODO:", "placeholder", "stub" language (Principle #8)
- No loops/iterations described that "will do X later" (Principle #8)
- No hardcoded domain values in contracts (Principle #2)
Consistency Checks
- All types referenced in contracts exist (in spec or codebase)
- All functions called in demos are defined (in spec or codebase)
- Phase N only uses contracts from phases 1..N (dependency order)
- Stated scope matches contracts (nothing extra, nothing missing)
- Contract names follow existing codebase conventions
Ambiguity Checks
- No weasel words: "appropriate", "as needed", "etc.", "various", "properly"
- No vague verbs: "handle", "process", "manage" without specifics
- Error conditions are specific (not just "raises Exception")
- Return types are concrete (not "suitable value")
- None/empty input behavior is specified
- Edge cases appear in Raises section
Testability Checks
- Each contract has at least one obvious test case
- Demo requirements are specific enough to automate
- Success criteria are measurable (not "works correctly")
- Can write a test that would FAIL if contract is wrong
Architecture Checks
- New code doesn't break existing interfaces
- Module placement matches existing structure
- Naming follows existing conventions
- No circular dependencies introduced
Evaluation Process
Step 1: Read Spec Cold
Read the entire spec without referring to other docs. Note:
- What's confusing?
- What questions do you have?
- What seems underspecified?
Step 2: Run Checklist
Go through each check systematically. For every failure:
- Note the location (section, line if possible)
- Describe the problem specifically
- Suggest a fix
Step 3: Adversarial Questions
For each contract, ask:
- "What's the worst reasonable misinterpretation?"
- "What input would break this?"
- "What if I implemented this lazily/wrong — would tests catch it?"
Step 4: Dependency Trace
Trace through phases in order:
- Phase 1: What does it need? (should be nothing or existing code)
- Phase 2: What does it need from Phase 1? Is that actually delivered?
- Continue for all phases
Step 5: Demo Feasibility
For each demo:
- Can I actually write this with the contracts provided?
- Does it prove what it claims to prove?
- What could pass the demo but still be wrong?
Output Format
# Spec Evaluation: [Sprint Name] **Verdict: PASS / NEEDS WORK / FAIL** **Summary:** [One sentence assessment] --- ## Blocking Issues Issues that MUST be fixed before implementation. ### 1. [Short Title] **Location:** [Section/line reference] **Category:** [Structural/Principle/Consistency/Ambiguity/Testability/Architecture] **Problem:** [Specific description] **Impact:** [What goes wrong if not fixed] **Suggested Fix:** [How to fix it] --- ## Warnings Issues that SHOULD be addressed but aren't blocking. ### 1. [Short Title] **Location:** [Section/line reference] **Problem:** [Description] **Suggestion:** [How to improve] --- ## Notes Observations that aren't issues but worth considering. - [Note 1] - [Note 2] --- ## Checklist Results | Category | Pass | Fail | Issues | |----------|------|------|--------| | Structural | 5 | 1 | Missing file structure | | Principles | 4 | 0 | — | | Consistency | 3 | 1 | Undefined type | | Ambiguity | 3 | 2 | Weasel words | | Testability | 3 | 0 | — | | Architecture | 4 | 0 | — | --- ## Verdict Explanation **PASS:** No blocking issues. Warnings are minor. Spec is ready for implementation. **NEEDS WORK:** No blocking issues but warnings are significant. Review before proceeding. **FAIL:** Blocking issues found. Must fix and re-evaluate before implementation.
Verdicts
PASS
- Zero blocking issues
- Warnings are cosmetic or minor
- Checklist mostly green
- Confident implementation will succeed
NEEDS WORK
- Zero blocking issues
- But significant warnings that could cause problems
- User should review warnings before proceeding
- Implementation might need course correction
FAIL
- One or more blocking issues
- Spec is incomplete, contradictory, or violates principles
- Must fix and run eval-sprint again
- Do NOT proceed to implement-sprint
Common Blocking Issues
| Issue | Why It Blocks |
|---|---|
| Missing contract for stated scope | Implementer won't know what to build |
| Undefined type referenced | Code won't compile |
| Default parameter in signature | Principle #7 violation |
| Phase dependency violation | Phase N can't be built |
| Ambiguous return type | Implementer will guess wrong |
Common Warnings
| Issue | Why It's a Warning |
|---|---|
| Vague error message | Implementation will work but be unhelpful |
| Missing edge case | Might cause bug but not blocking |
| Inconsistent naming | Annoying but not fatal |
| Demo doesn't fully exercise contract | Partial verification |
After Evaluation
- If PASS: Proceed to
/implement-sprint - If NEEDS WORK: Review warnings, decide to proceed or fix
- If FAIL: Fix blocking issues, run
again/eval-sprint
Tips for Evaluators
- Be skeptical — Assume the spec has problems until proven otherwise
- Read literally — Don't fill in gaps with what you think they meant
- Think like a lazy implementer — What's the minimal interpretation?
- Think like a malicious implementer — What technically satisfies the spec but is wrong?
- Check the edges — Empty lists, None values, zero counts, boundary conditions