Awesome-omni-skill edd
Eval-Driven Development (EDD) Framework v2.87.0 - Define-before-implement pattern with structured evals. Provides workflow: Define specifications → Implement features → Verify against evals. Components: TEMPLATE.md for eval definitions, edd.sh CLI script, /edd skill invocation. Check types: CC- (Capability), BC- (Behavior), NFC- (Non-Functional). Integrates with orchestrator workflow for quality-first development. Keywords: evals, define, implement, verify, capability checks, behavior checks, non-functional checks, template, quality assurance, test-driven, specification. Use when: defining new features with structured evals, implementing with verification requirements, creating quality specifications, TDD-style workflow with evals.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/edd" ~/.claude/skills/diegosouzapw-awesome-omni-skill-edd && rm -rf "$T"
skills/tools/edd/SKILL.mdEDD (Eval-Driven Development) Framework v2.64
Eval-Driven Development is a quality-first development pattern that enforces define-before-implement workflow with structured evaluations.
v2.88 Key Changes (MODEL-AGNOSTIC)
- Model-agnostic: Uses model configured in
or CLI/env vars~/.claude/settings.json - No flags required: Works with the configured default model
- Flexible: Works with GLM-5, Claude, Minimax, or any configured model
- Settings-driven: Model selection via
env varsANTHROPIC_DEFAULT_*_MODEL
What is EDD?
EDD provides a systematic approach to software development with three phases:
- DEFINE - Create structured eval specifications using TEMPLATE.md
- IMPLEMENT - Build features according to eval definitions
- VERIFY - Validate implementation against eval criteria
Check Types
| Prefix | Type | Purpose |
|---|---|---|
| Capability Checks | Feature capabilities and functionality |
| Behavior Checks | Expected behaviors and responses |
| Non-Functional Checks | Performance, security, maintainability |
Usage
# Invoke EDD workflow /edd "Define memory-search feature" # CLI script (if available) ralph edd define memory-search ralph edd check memory-search
Components
- TEMPLATE.md: Template for creating eval definitions
- edd.sh: CLI script for eval management
- /edd skill: Skill invocation from Claude Code
- ~/.claude/evals/: Directory for eval definitions
Template Structure
Each eval definition includes:
- Capability Checks (CC-) - What the feature can do
- Behavior Checks (BC-) - How the feature behaves
- Non-Functional Checks (NFC-) - Performance, security, etc.
- Implementation Notes - Technical guidance
- Verification Evidence - Test results
Example: memory-search.md
# Memory Search Eval **Status**: DRAFT **Created**: 2026-01-30 ## Capability Checks - [ ] CC-1: Search across semantic memory - [ ] CC-2: Support filtering by type ## Behavior Checks - [ ] BC-1: Returns ranked results - [ ] BC-2: Handles empty queries gracefully ## Non-Functional Checks - [ ] NFC-1: Search completes in <2s - [ ] NFC-2: Memory usage <100MB ## Implementation Notes - Use parallel search for performance - Cache frequent queries ## Verification Evidence - Test results attached
Integration with Orchestrator
EDD integrates with the orchestrator workflow to ensure quality-first development:
- Clarify phase - Define evals
- Plan phase - Review eval requirements
- Implement phase - Build to eval specs
- Validate phase - Verify against evals
Swarm Mode Integration (v2.81.1)
EDD framework now supports swarm mode for parallel evaluation across multiple check types.
Auto-Spawn Configuration
When invoked via
/edd, the framework automatically spawns a specialized evaluation team:
Task: subagent_type: "general-purpose" model: "sonnet" team_name: "edd-evaluation-team" name: "edd-coordinator" mode: "delegate" run_in_background: true prompt: | Execute Eval-Driven Development workflow for: $ARGUMENTS EDD Pattern: 1. DEFINE - Create structured eval specifications 2. DISTRIBUTE - Assign check types to specialists 3. VERIFY - Validate against eval criteria 4. CONSOLIDATE - Merge findings from all evaluators
Team Composition
| Role | Purpose | Specialization |
|---|---|---|
| Coordinator | EDD workflow orchestration | Manages eval lifecycle, consolidates findings |
| Teammate 1 | Capability Checks specialist | CC- prefix: feature capabilities and functionality |
| Teammate 2 | Behavior Checks specialist | BC- prefix: expected behaviors and responses |
| Teammate 3 | Non-Functional Checks specialist | NFC- prefix: performance, security, maintainability |
Swarm Mode Workflow
User invokes: /edd "Define memory-search feature" 1. Team "edd-evaluation-team" created 2. Coordinator (edd-coordinator) receives task 3. 3 Teammates spawned with check-type specializations 4. Eval definition distributed: - Teammate 1 → Capability Checks (CC-) - Teammate 2 → Behavior Checks (BC-) - Teammate 3 → Non-Functional Checks (NFC-) 5. Teammates work in parallel (background execution) 6. Coordinator monitors progress and gathers results 7. Findings consolidated into single eval specification 8. Final eval document returned
Parallel Evaluation Pattern
Each teammate focuses on their check type:
# Teammate 1: Capability Checks CC-1: Feature can perform X CC-2: Feature supports Y configuration CC-3: Feature integrates with Z system # Teammate 2: Behavior Checks BC-1: Feature handles error case A gracefully BC-2: Feature returns expected response for B BC-3: Feature maintains state across C # Teammate 3: Non-Functional Checks NFC-1: Response time < 100ms NFC-2: Memory usage < 50MB NFC-3: Security vulnerability scan passes
Communication Between Teammates
Teammates use the built-in mailbox system:
# Teammate sends finding to coordinator SendMessage: type: "message" recipient: "edd-coordinator" content: "CC-3 defined: Feature integrates with auth system via OAuth2"
Task List Coordination
All teammates share a unified task list:
# Location: ~/.claude/tasks/edd-evaluation-team/tasks.json # Example tasks: [ {"id": "1", "subject": "Define Capability Checks", "owner": "teammate-1"}, {"id": "2", "subject": "Define Behavior Checks", "owner": "teammate-2"}, {"id": "3", "subject": "Define Non-Functional Checks", "owner": "teammate-3"}, {"id": "4", "subject": "Consolidate eval specification", "owner": "edd-coordinator"} ]
Manual Override
To disable swarm mode:
/edd "Define feature X" --no-swarm
Output Location
# Evals saved to ~/.claude/evals/ ls ~/.claude/evals/ # View last eval cat ~/.claude/evals/latest.md
Testing
Test suite:
tests/test_v264_edd_framework.bats (33 tests)
Run tests:
bats tests/test_v264_edd_framework.bats
Swarm Mode Tests
Additional tests for swarm mode integration:
# Test swarm team creation tests/edd/test-swarm-team-creation.sh # Test parallel evaluation tests/edd/test-parallel-evaluation.sh
Status
Current: Framework defined with swarm mode integration (v2.81.1) Note: TEMPLATE.md and evals directory structure ready for use
Version: v2.64 | Status: DRAFT | Tests: 33 passing <claude-mem-context>
Recent Activity
<!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->No recent activity </claude-mem-context>