Mycelium reflexion

Use for self-correcting implementation. Implements the reflexion loop: implement, validate, self-critique, retry (max 3 iterations).

install

source · Clone the upstream repo

git clone https://github.com/haabe/mycelium

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/haabe/mycelium "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/reflexion" ~/.claude/skills/haabe-mycelium-reflexion && rm -rf "$T"

manifest: .claude/skills/reflexion/SKILL.md

source content

Reflexion Skill

Self-correcting implementation loop from the n-trax pattern.

Workflow

Iteration Loop (max 3)

Step 1: Implement

Create the deliverable according to the specification/acceptance criteria.
- Software: write code. Content: write/produce content. AI tool: write prompts/configs. Service: document workflow.
Follow engineering-principles.md (principles apply to all product types).
Apply patterns from patterns.md.
Check corrections.md for relevant past mistakes.

Step 2: Validate

Software: Run tests, linter, type checker, security scan, accessibility checks (if UI).
- Security validation (OWASP): Check input validation, output encoding, parameterized queries, no hardcoded secrets, authentication/authorization patterns, dependency vulnerabilities. Reference OWASP Top 10:2025 categories for each check.
Content: Review against learning objectives/editorial standards, check accessibility (captions, alt text), fact-check claims.
AI tool: Run eval test cases, red-team testing, bias assessment.
Service: Walk through the service blueprint end-to-end, verify documentation completeness.
All: Verify acceptance criteria.

Step 3: Self-Critique Review the implementation against (select items relevant to product_type):

Engineering principles: DRY, KISS, YAGNI, SoC (apply to all product types)
Security: Input validation, output encoding, no secrets, parameterized queries (software, ai_tool)
Accessibility: Semantic HTML, keyboard nav, contrast, screen reader (software); captions, transcripts, alt text (content)
Edge cases: What happens with unexpected input? Empty? Adversarial? (software, ai_tool)
Error handling / user recovery: Are errors handled gracefully? Can users recover? (software, service)
Quality: Factual accuracy, style consistency, source attribution (content); eval scores, safety scores (ai_tool)
Naming / clarity: Do names reveal intent? Would a new reader understand this? (all)
Completeness: Is anything missing that the user would expect? (all)

Step 4: Decide

If all validations pass AND self-critique finds no issues: DONE
If issues found AND iteration < 3: FIX and return to Step 1
If iteration = 3 AND issues remain: ESCALATE with documented issues

Escalation Protocol

When max iterations reached without full resolution:

Document what was attempted in each iteration.
Document remaining issues with severity assessment.
Recommend: fix now (blocking) vs. fix later (non-blocking) vs. accept risk.
Update corrections.md with learnings.

Verification Modes

The validate step in the reflexion loop should use the appropriate verification mode:

Rules-Based (deterministic)

Linters, formatters, schema validators, type checkers
Pass/fail is unambiguous — no judgment needed
Always run first — fastest and cheapest
Examples:
```
eslint
```
,
```
mypy
```
,
```
yamllint
```
, YAML schema validation against canvas-guidance.yml

Computational (deterministic)

Test runners, build systems, security scanners
Requires executing code — slower than rules-based
Results are objective but may need interpretation (flaky tests)
Examples:
```
pytest
```
,
```
npm test
```
,
```
cargo clippy
```
, OWASP dependency check

Inferential (probabilistic)

LLM-as-judge, peer review, heuristic evaluation
Used when rules-based and computational verification are insufficient
Results require confidence scoring — never treat as definitive
Examples:
```
/devils-advocate
```
,
```
/usability-check
```
, auto-dogfood evaluation, design review
The auto-dogfood system is an inferential verification loop

Order: Always attempt rules-based → computational → inferential. Only escalate to the next mode when the previous mode cannot verify the property in question.

Source: Trivedy (Anatomy of an Agent Harness, LangChain blog). Three-mode taxonomy adapted from Böckeler (Harness Engineering, martinfowler.com — computational vs inferential distinction). Note: harnesses continue to matter even as models improve — they engineer systems around model intelligence, not just patch deficiencies.

Rules

Each iteration must show measurable improvement over the previous.
If the same issue recurs across iterations, investigate root cause rather than patching symptoms.
Never skip the self-critique step, even if tests pass.
Log the reflexion loop outcome in delivery-journal.md.

Theory Citations

Reflexion pattern (Shinn et al.)
Clean Code (Martin)
OWASP secure coding
WCAG 2.1 AA