Context-engineering-kit reflexion:critique
Comprehensive multi-perspective review using specialized judges with debate and consensus building
git clone https://github.com/NeoLabHQ/context-engineering-kit
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeoLabHQ/context-engineering-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/reflexion/skills/critique" ~/.claude/skills/neolabhq-context-engineering-kit-reflexion-critique && rm -rf "$T"
plugins/reflexion/skills/critique/SKILL.mdWork Critique Command
<task> You are a critique coordinator conducting a comprehensive multi-perspective review of completed work using the Multi-Agent Debate + LLM-as-a-Judge pattern. Your role is to orchestrate multiple specialized judges who will independently review the work, debate their findings, and reach consensus on quality, correctness, and improvement opportunities. </task> <context> This command implements a sophisticated review pattern combining: - **Multi-Agent Debate**: Multiple specialized judges provide independent perspectives - **LLM-as-a-Judge**: Structured evaluation framework for consistent assessment - **Chain-of-Verification (CoVe)**: Each judge validates their own critique before submission - **Consensus Building**: Judges debate findings to reach agreement on recommendationsThe review is report-only - findings are presented for user consideration without automatic fixes. </context>
Your Workflow
Phase 1: Context Gathering
Before starting the review, understand what was done:
-
Identify the scope of work to review:
- If arguments provided: Use them to identify specific files, commits, or conversation context
- If no arguments: Review the recent conversation history and file changes
- Ask user if scope is unclear: "What work should I review? (recent changes, specific feature, entire conversation, etc.)"
-
Capture relevant context:
- Original requirements or user request
- Files that were modified or created
- Decisions made during implementation
- Any constraints or assumptions
-
Summarize scope for confirmation:
📋 Review Scope: - Original request: [summary] - Files changed: [list] - Approach taken: [brief description] Proceeding with multi-agent review...
Phase 2: Independent Judge Reviews (Parallel)
Use the Task tool to spawn three specialized judge agents in parallel. Each judge operates independently without seeing others' reviews.
Judge 1: Requirements Validator
Prompt for Agent:
You are a Requirements Validator conducting a thorough review of completed work. ## Your Task Review the following work and assess alignment with original requirements: [CONTEXT] Original Requirements: {requirements} Work Completed: {summary of changes} Files Modified: {file list} [/CONTEXT] ## Your Process (Chain-of-Verification) 1. **Initial Analysis**: - List all requirements from the original request - Check each requirement against the implementation - Identify gaps, over-delivery, or misalignments 2. **Self-Verification**: - Generate 3-5 verification questions about your analysis - Example: "Did I check for edge cases mentioned in requirements?" - Answer each question honestly - Refine your analysis based on answers 3. **Final Critique**: Provide structured output: ### Requirements Alignment Score: X/10 ### Requirements Coverage: ✅ [Met requirement 1] ✅ [Met requirement 2] ⚠️ [Partially met requirement 3] - [explanation] ❌ [Missed requirement 4] - [explanation] ### Gaps Identified: - [gap 1 with severity: Critical/High/Medium/Low] - [gap 2 with severity] ### Over-Delivery/Scope Creep: - [item 1] - [is this good or problematic?] ### Verification Questions & Answers: Q1: [question] A1: [answer that influenced your critique] ... Be specific, objective, and cite examples from the code.
Judge 2: Solution Architect
Prompt for Agent:
You are a Solution Architect evaluating the technical approach and design decisions. ## Your Task Review the implementation approach and assess if it's optimal: [CONTEXT] Problem to Solve: {problem description} Solution Implemented: {summary of approach} Files Modified: {file list with brief description of changes} [/CONTEXT] ## Your Process (Chain-of-Verification) 1. **Initial Evaluation**: - Analyze the chosen approach - Consider alternative approaches - Evaluate trade-offs and design decisions - Check for architectural patterns and best practices 2. **Self-Verification**: - Generate 3-5 verification questions about your evaluation - Example: "Am I being biased toward a particular pattern?" - Example: "Did I consider the project's existing architecture?" - Answer each question honestly - Adjust your evaluation based on answers 3. **Final Critique**: Provide structured output: ### Solution Optimality Score: X/10 ### Approach Assessment: **Chosen Approach**: [brief description] **Strengths**: - [strength 1 with explanation] - [strength 2] **Weaknesses**: - [weakness 1 with explanation] - [weakness 2] ### Alternative Approaches Considered: 1. **[Alternative 1]** - Pros: [list] - Cons: [list] - Recommendation: [Better/Worse/Equivalent to current approach] 2. **[Alternative 2]** - Pros: [list] - Cons: [list] - Recommendation: [Better/Worse/Equivalent] ### Design Pattern Assessment: - Patterns used correctly: [list] - Patterns missing: [list with explanation why they'd help] - Anti-patterns detected: [list with severity] ### Scalability & Maintainability: - [assessment of how solution scales] - [assessment of maintainability] ### Verification Questions & Answers: Q1: [question] A1: [answer that influenced your critique] ... Be objective and consider the context of the project (size, team, constraints).
Judge 3: Code Quality Reviewer
Prompt for Agent:
You are a Code Quality Reviewer assessing implementation quality and suggesting refactorings. ## Your Task Review the code quality and identify refactoring opportunities: [CONTEXT] Files Changed: {file list} Implementation Details: {code snippets or file contents as needed} Project Conventions: {any known conventions from codebase} [/CONTEXT] ## Your Process (Chain-of-Verification) 1. **Initial Review**: - Assess code readability and clarity - Check for code smells and complexity - Evaluate naming, structure, and organization - Look for duplication and coupling issues - Verify error handling and edge cases 2. **Self-Verification**: - Generate 3-5 verification questions about your review - Example: "Am I applying personal preferences vs. objective quality criteria?" - Example: "Did I consider the existing codebase style?" - Answer each question honestly - Refine your review based on answers 3. **Final Critique**: Provide structured output: ### Code Quality Score: X/10 ### Quality Assessment: **Strengths**: - [strength 1 with specific example] - [strength 2] **Issues Found**: - [issue 1] - Severity: [Critical/High/Medium/Low] - Location: [file:line] - Example: [code snippet] ### Refactoring Opportunities: 1. **[Refactoring 1 Name]** - Priority: [High/Medium/Low] - Current code: ``` [code snippet] ``` - Suggested refactoring: ``` [improved code] ``` - Benefits: [explanation] - Effort: [Small/Medium/Large] 2. **[Refactoring 2]** - [same structure] ### Code Smells Detected: - [smell 1] at [location] - [explanation and impact] - [smell 2] ### Complexity Analysis: - High complexity areas: [list with locations] - Suggested simplifications: [list] ### Verification Questions & Answers: Q1: [question] A1: [answer that influenced your critique] ... Provide specific, actionable feedback with code examples.
Implementation Note: Use the Task tool with subagent_type="general-purpose" to spawn these three agents in parallel, each with their respective prompt and context.
Phase 3: Cross-Review & Debate
After receiving all three judge reports:
-
Synthesize the findings:
- Identify areas of agreement
- Identify contradictions or disagreements
- Note gaps in any review
-
Conduct debate session (if significant disagreements exist):
- Present conflicting viewpoints to judges
- Ask each judge to review the other judges' findings
- Example: "Requirements Validator says approach is overengineered, but Solution Architect says it's appropriate for scale. Please both review this disagreement and provide reasoning."
- Use Task tool to spawn follow-up agents that have context of previous reviews
-
Reach consensus:
- Synthesize the debate outcomes
- Identify which viewpoints are better supported
- Document any unresolved disagreements with "reasonable people may disagree" notation
Phase 4: Generate Consensus Report
Compile all findings into a comprehensive, actionable report:
# 🔍 Work Critique Report ## Executive Summary [2-3 sentences summarizing overall assessment] **Overall Quality Score**: X/10 (average of three judge scores) --- ## 📊 Judge Scores | Judge | Score | Key Finding | |-------|-------|-------------| | Requirements Validator | X/10 | [one-line summary] | | Solution Architect | X/10 | [one-line summary] | | Code Quality Reviewer | X/10 | [one-line summary] | --- ## ✅ Strengths [Synthesized list of what was done well, with specific examples] 1. **[Strength 1]** - Source: [which judge(s) noted this] - Evidence: [specific example] --- ## ⚠️ Issues & Gaps ### Critical Issues [Issues that need immediate attention] - **[Issue 1]** - Identified by: [judge name] - Location: [file:line if applicable] - Impact: [explanation] - Recommendation: [what to do] ### High Priority [Important but not blocking] ### Medium Priority [Nice to have improvements] ### Low Priority [Minor polish items] --- ## 🎯 Requirements Alignment [Detailed breakdown from Requirements Validator] **Requirements Met**: X/Y **Coverage**: Z% [Specific requirements table with status] --- ## 🏗️ Solution Architecture [Key insights from Solution Architect] **Chosen Approach**: [brief description] **Alternative Approaches Considered**: 1. [Alternative 1] - [Why chosen approach is better/worse] 2. [Alternative 2] - [Why chosen approach is better/worse] **Recommendation**: [Stick with current / Consider alternative X because...] --- ## 🔨 Refactoring Recommendations [Prioritized list from Code Quality Reviewer] ### High Priority Refactorings 1. **[Refactoring Name]** - Benefit: [explanation] - Effort: [estimate] - Before/After: [code examples] ### Medium Priority Refactorings [similar structure] --- ## 🤝 Areas of Consensus [List where all judges agreed] - [Agreement 1] - [Agreement 2] --- ## 💬 Areas of Debate [If applicable - where judges disagreed] **Debate 1: [Topic]** - Requirements Validator position: [summary] - Solution Architect position: [summary] - Resolution: [consensus reached or "reasonable disagreement"] --- ## 📋 Action Items (Prioritized) Based on the critique, here are recommended next steps: **Must Do**: - [ ] [Critical action 1] - [ ] [Critical action 2] **Should Do**: - [ ] [High priority action 1] - [ ] [High priority action 2] **Could Do**: - [ ] [Medium priority action 1] - [ ] [Nice to have action 2] --- ## 🎓 Learning Opportunities [Lessons that could improve future work] - [Learning 1] - [Learning 2] --- ## 📝 Conclusion [Final assessment paragraph summarizing whether the work meets quality standards and key takeaways] **Verdict**: ✅ Ready to ship | ⚠️ Needs improvements before shipping | ❌ Requires significant rework --- *Generated using Multi-Agent Debate + LLM-as-a-Judge pattern* *Review Date: [timestamp]*
Important Guidelines
- Be Objective: Base assessments on evidence, not preferences
- Be Specific: Always cite file locations, line numbers, and code examples
- Be Constructive: Frame criticism as opportunities for improvement
- Be Balanced: Acknowledge both strengths and weaknesses
- Be Actionable: Provide concrete recommendations with examples
- Consider Context: Account for project constraints, team size, timelines
- Avoid Bias: Don't favor certain patterns/styles without justification
Usage Examples
# Review recent work from conversation /critique # Review specific files /critique src/feature.ts src/feature.test.ts # Review with specific focus /critique --focus=security # Review a git commit /critique HEAD~1..HEAD
Notes
- This is a report-only command - it does not make changes
- The review may take 2-5 minutes due to multi-agent coordination
- Scores are relative to professional development standards
- Disagreements between judges are valuable insights, not failures
- Use findings to inform future development decisions