Mycelium reflexion
Use for self-correcting implementation. Implements the reflexion loop: implement, validate, self-critique, retry (max 3 iterations).
git clone https://github.com/haabe/mycelium
T=$(mktemp -d) && git clone --depth=1 https://github.com/haabe/mycelium "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/reflexion" ~/.claude/skills/haabe-mycelium-reflexion && rm -rf "$T"
.claude/skills/reflexion/SKILL.mdReflexion Skill
Self-correcting implementation loop from the n-trax pattern.
Workflow
Iteration Loop (max 3)
Step 1: Implement
- Create the deliverable according to the specification/acceptance criteria.
- Software: write code. Content: write/produce content. AI tool: write prompts/configs. Service: document workflow.
- Follow engineering-principles.md (principles apply to all product types).
- Apply patterns from patterns.md.
- Check corrections.md for relevant past mistakes.
Step 2: Validate
- Software: Run tests, linter, type checker, security scan, accessibility checks (if UI).
- Security validation (OWASP): Check input validation, output encoding, parameterized queries, no hardcoded secrets, authentication/authorization patterns, dependency vulnerabilities. Reference OWASP Top 10:2025 categories for each check.
- Content: Review against learning objectives/editorial standards, check accessibility (captions, alt text), fact-check claims.
- AI tool: Run eval test cases, red-team testing, bias assessment.
- Service: Walk through the service blueprint end-to-end, verify documentation completeness.
- All: Verify acceptance criteria.
Step 3: Self-Critique Review the implementation against (select items relevant to product_type):
- Engineering principles: DRY, KISS, YAGNI, SoC (apply to all product types)
- Security: Input validation, output encoding, no secrets, parameterized queries (software, ai_tool)
- Accessibility: Semantic HTML, keyboard nav, contrast, screen reader (software); captions, transcripts, alt text (content)
- Edge cases: What happens with unexpected input? Empty? Adversarial? (software, ai_tool)
- Error handling / user recovery: Are errors handled gracefully? Can users recover? (software, service)
- Quality: Factual accuracy, style consistency, source attribution (content); eval scores, safety scores (ai_tool)
- Naming / clarity: Do names reveal intent? Would a new reader understand this? (all)
- Completeness: Is anything missing that the user would expect? (all)
Step 4: Decide
- If all validations pass AND self-critique finds no issues: DONE
- If issues found AND iteration < 3: FIX and return to Step 1
- If iteration = 3 AND issues remain: ESCALATE with documented issues
Escalation Protocol
When max iterations reached without full resolution:
- Document what was attempted in each iteration.
- Document remaining issues with severity assessment.
- Recommend: fix now (blocking) vs. fix later (non-blocking) vs. accept risk.
- Update corrections.md with learnings.
Verification Modes
The validate step in the reflexion loop should use the appropriate verification mode:
Rules-Based (deterministic)
- Linters, formatters, schema validators, type checkers
- Pass/fail is unambiguous — no judgment needed
- Always run first — fastest and cheapest
- Examples:
,eslint
,mypy
, YAML schema validation against canvas-guidance.ymlyamllint
Computational (deterministic)
- Test runners, build systems, security scanners
- Requires executing code — slower than rules-based
- Results are objective but may need interpretation (flaky tests)
- Examples:
,pytest
,npm test
, OWASP dependency checkcargo clippy
Inferential (probabilistic)
- LLM-as-judge, peer review, heuristic evaluation
- Used when rules-based and computational verification are insufficient
- Results require confidence scoring — never treat as definitive
- Examples:
,/devils-advocate
, auto-dogfood evaluation, design review/usability-check - The auto-dogfood system is an inferential verification loop
Order: Always attempt rules-based → computational → inferential. Only escalate to the next mode when the previous mode cannot verify the property in question.
Source: Trivedy (Anatomy of an Agent Harness, LangChain blog). Three-mode taxonomy adapted from Böckeler (Harness Engineering, martinfowler.com — computational vs inferential distinction). Note: harnesses continue to matter even as models improve — they engineer systems around model intelligence, not just patch deficiencies.
Rules
- Each iteration must show measurable improvement over the previous.
- If the same issue recurs across iterations, investigate root cause rather than patching symptoms.
- Never skip the self-critique step, even if tests pass.
- Log the reflexion loop outcome in delivery-journal.md.
Theory Citations
- Reflexion pattern (Shinn et al.)
- Clean Code (Martin)
- OWASP secure coding
- WCAG 2.1 AA