Claude-elixir-phoenix plugin-dev-workflow
Guide plugin development workflow — editing skills, agents, hooks, or eval framework in this repo. Use when modifying files in plugins/elixir-phoenix/, lab/eval/, or lab/autoresearch/. Ensures changes pass eval, lint, and tests before committing.
install
source · Clone the upstream repo
git clone https://github.com/oliver-kriska/claude-elixir-phoenix
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/oliver-kriska/claude-elixir-phoenix "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/plugin-dev-workflow" ~/.claude/skills/oliver-kriska-claude-elixir-phoenix-plugin-dev-workflow && rm -rf "$T"
manifest:
.claude/skills/plugin-dev-workflow/SKILL.mdsource content
Plugin Development Workflow
This repo is the Elixir/Phoenix Claude Code plugin. When editing plugin files, follow this workflow to ensure quality.
Before You Start
Run
make help to see all available commands:
make eval # Quick: lint + score changed skills/agents make eval-all # Full: all 40 skills + 20 agents make eval-fix # Auto-fix + show failures make test # 52 pytest tests for eval framework make ci # Full CI pipeline
Scoring Individual Files (CLI)
IMPORTANT: Always use
-m module syntax, never run scorer.py directly.
# Score ONE skill (use -m, NOT direct file path) python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md # Score ONE skill with pretty output python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md --pretty # Score all skills python3 -m lab.eval.scorer --all # Score ONE agent python3 -m lab.eval.agent_scorer plugins/elixir-phoenix/agents/verification-runner.md # Score all agents python3 -m lab.eval.agent_scorer --all make ci # Full CI pipeline
When Editing Skills (plugins/elixir-phoenix/skills/*/SKILL.md)
- Read CLAUDE.md conventions (size limits, frontmatter requirements)
- Make your changes
- Run
— it auto-detects changed skills and scores themmake eval - If FAIL: check the dimension that failed, fix it
- Run
to verify markdown formattingmake lint - Commit
Skill requirements (eval checks all of these):
- Frontmatter: name, description, effort. Description must start with action verb + include "Use when..."
- Iron Laws section with 1+ numbered items
- Under 185 lines (command skills) or 150 lines (reference skills)
- No section exceeds 45 lines
- All
references point to existing skills/phx: - All
paths existreferences/*.md - No dangerous code patterns outside Iron Laws sections
- Code examples present (1+ fenced code blocks)
- "Use when..." in description (for trigger accuracy)
When Editing Agents (plugins/elixir-phoenix/agents/*.md)
- Make your changes
- Run
to score all agentsmake eval-agents - Agent requirements:
(always — background agents need it)permissionMode: bypassPermissions
for review/analysis agentsdisallowedTools: Write, Edit, NotebookEdit- model matches effort: haiku=low, sonnet=medium, opus=high
- Under 300 lines (specialist) or 535 lines (orchestrator)
When Editing Eval Framework (lab/eval/*.py)
- Make your changes
- Run
— 52 pytest tests must passmake test - Run
— verify no skills/agents regressedmake eval-all - If adding new matchers: add tests in
lab/eval/tests/test_matchers.py
When Editing Hooks (plugins/elixir-phoenix/hooks/scripts/*.sh)
- Make your changes
- Run
(markdown in hook comments)make lint - Test the hook manually (hooks run on Edit/Write/Bash events)
- Check CLAUDE.md hook documentation is still accurate
Autoresearch (Self-Improvement Loop)
If
make eval-fix shows failures, it suggests an autoresearch command:
# Copy-paste the suggested command from eval-fix output claude -p 'Run autoresearch. Score all skills...' --allowedTools 'Edit,Read,Write,Bash,Glob,Grep'
This runs the autoresearch loop: find weakest skill → fix ONE issue → re-score → keep/revert.
Pre-Commit Checklist
Before committing any plugin changes:
-
passesmake lint -
passes (changed files)make eval -
passes (if eval framework changed)make test - CHANGELOG.md updated (if user-visible change)
- Version bumped in plugin.json (if releasing)
References
- CLAUDE.md — full conventions, size limits, checklist
— scoring framework (24 matchers, 8 dimensions)lab/eval/
— self-improvement looplab/autoresearch/
— log interesting discoveries herelab/findings/interesting.jsonl