Claude-elixir-phoenix plugin-dev-workflow

Guide plugin development workflow — editing skills, agents, hooks, or eval framework in this repo. Use when modifying files in plugins/elixir-phoenix/, lab/eval/, or lab/autoresearch/. Ensures changes pass eval, lint, and tests before committing.

install

source · Clone the upstream repo

git clone https://github.com/oliver-kriska/claude-elixir-phoenix

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/oliver-kriska/claude-elixir-phoenix "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/plugin-dev-workflow" ~/.claude/skills/oliver-kriska-claude-elixir-phoenix-plugin-dev-workflow && rm -rf "$T"

manifest: .claude/skills/plugin-dev-workflow/SKILL.md

Plugin Development Workflow

This repo is the Elixir/Phoenix Claude Code plugin. When editing plugin files, follow this workflow to ensure quality.

Before You Start

Run

make help

to see all available commands:

make eval          # Quick: lint + score changed skills/agents
make eval-all      # Full: all 40 skills + 20 agents
make eval-fix      # Auto-fix + show failures
make test          # 52 pytest tests for eval framework
make ci            # Full CI pipeline

Scoring Individual Files (CLI)

IMPORTANT: Always use

-m

module syntax, never run scorer.py directly.

# Score ONE skill (use -m, NOT direct file path)
python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md

# Score ONE skill with pretty output
python3 -m lab.eval.scorer plugins/elixir-phoenix/skills/verify/SKILL.md --pretty

# Score all skills
python3 -m lab.eval.scorer --all

# Score ONE agent
python3 -m lab.eval.agent_scorer plugins/elixir-phoenix/agents/verification-runner.md

# Score all agents
python3 -m lab.eval.agent_scorer --all
make ci            # Full CI pipeline

When Editing Skills (plugins/elixir-phoenix/skills/*/SKILL.md)

Read CLAUDE.md conventions (size limits, frontmatter requirements)
Make your changes
Run
```
make eval
```
— it auto-detects changed skills and scores them
If FAIL: check the dimension that failed, fix it
Run
```
make lint
```
to verify markdown formatting
Commit

Skill requirements (eval checks all of these):

Frontmatter: name, description, effort. Description must start with action verb + include "Use when..."
Iron Laws section with 1+ numbered items
Under 185 lines (command skills) or 150 lines (reference skills)
No section exceeds 45 lines
All
```
/phx:
```
references point to existing skills
All
```
references/*.md
```
paths exist
No dangerous code patterns outside Iron Laws sections
Code examples present (1+ fenced code blocks)
"Use when..." in description (for trigger accuracy)

When Editing Agents (plugins/elixir-phoenix/agents/*.md)

Make your changes
Run
```
make eval-agents
```
to score all agents
Agent requirements:
- ```
permissionMode: bypassPermissions
```
  (always — background agents need it)
- ```
disallowedTools: Write, Edit, NotebookEdit
```
  for review/analysis agents
- model matches effort: haiku=low, sonnet=medium, opus=high
- Under 300 lines (specialist) or 535 lines (orchestrator)

When Editing Eval Framework (lab/eval/*.py)

Make your changes
Run
```
make test
```
— 52 pytest tests must pass
Run
```
make eval-all
```
— verify no skills/agents regressed
If adding new matchers: add tests in
```
lab/eval/tests/test_matchers.py
```

When Editing Hooks (plugins/elixir-phoenix/hooks/scripts/*.sh)

Make your changes
Run
```
make lint
```
(markdown in hook comments)
Test the hook manually (hooks run on Edit/Write/Bash events)
Check CLAUDE.md hook documentation is still accurate

Autoresearch (Self-Improvement Loop)

make eval-fix

shows failures, it suggests an autoresearch command:

# Copy-paste the suggested command from eval-fix output
claude -p 'Run autoresearch. Score all skills...' --allowedTools 'Edit,Read,Write,Bash,Glob,Grep'

This runs the autoresearch loop: find weakest skill → fix ONE issue → re-score → keep/revert.

Pre-Commit Checklist

Before committing any plugin changes:

```
make lint
```
passes
```
make eval
```
passes (changed files)
```
make test
```
passes (if eval framework changed)
CHANGELOG.md updated (if user-visible change)
Version bumped in plugin.json (if releasing)

References

CLAUDE.md — full conventions, size limits, checklist
```
lab/eval/
```
— scoring framework (24 matchers, 8 dimensions)
```
lab/autoresearch/
```
— self-improvement loop
```
lab/findings/interesting.jsonl
```
— log interesting discoveries here