Claude-skill-registry honest-reflections
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/honest-reflections" ~/.claude/skills/majiayu000-claude-skill-registry-honest-reflections && rm -rf "$T"
skills/data/honest-reflections/SKILL.mdHonest Reflections Skill
Overview
Purpose: Systematic gap analysis using 100+ sequential thoughts to identify discrepancies between claimed completion and actual delivery. Prevents premature completion declarations by revealing assumptions, partial work, missing components, and rationalization patterns.
Core Value: Catches the moment when you're about to claim "100% complete" on 50% completion.
Key Innovation: Self-assessment protocol that replicates critical external review, catching gaps before they become credibility issues.
When to Use This Skill
MANDATORY (Must Use)
Use this skill when:
- Before declaring work "complete": Any statement like "all done", "100% finished", "scope complete"
- Final commit before handoff: Last commit of major work session
- Completion milestones: MVP complete, phase complete, project done
- Quality gate reviews: Before presenting work to stakeholders
- After long work sessions: 6+ hours of continuous work without checkpoint
RECOMMENDED (Should Use)
- After each major phase of multi-phase project
- When tempted to rationalize skipping remaining work
- Before creating handoff documentation
- When user asks "is it really complete?"
- Periodic self-audits (weekly for long projects)
CONDITIONAL (May Use)
- Mid-project health checks
- When feeling uncertainty about completeness
- After receiving feedback suggesting gaps
- Learning from past incomplete deliveries
DO NOT Rationalize Skipping Because
❌ "Work looks complete" → Appearances deceive, systematic check required ❌ "I'm confident it's done" → Confidence without verification is overconfidence ❌ "Takes too long" → 30-minute reflection prevents hours of rework ❌ "Already did self-review" → Mental review misses 40-60% of gaps ❌ "User didn't explicitly ask" → Professional responsibility to verify completion
Anti-Rationalization (From Baseline Testing)
CRITICAL: Agents systematically skip honest reflection to avoid discovering gaps. Below are the 6 most common rationalizations detected in baseline testing, with mandatory counters.
Rationalization 1: "Obviously Complete, No Need to Reflect"
Example: Agent finishes implementing 8 features, thinks "all features done", declares complete without checking original spec that required 12 features
COUNTER:
- ❌ NEVER trust "obviously complete" without systematic verification
- ✅ "Obvious" is subjective; gap analysis is objective
- ✅ Agents miss 40-60% of gaps in self-assessment without systematic process
- ✅ 100+ thought reflection reveals gaps mental review misses
Rule: Run systematic reflection before ANY completion claim. No exceptions.
Rationalization 2: "Reflection Takes Too Long, Just Ship It"
Example: Agent thinks "reflection would take 30 minutes, I'll just commit and fix gaps if reported"
COUNTER:
- ❌ NEVER skip reflection to save time
- ✅ 30-minute reflection now prevents 4-hour rework from missed gaps
- ✅ Shipping incomplete work damages credibility (costs more than time saved)
- ✅ ROI: Reflection time vs rework time = 1:8 ratio
Rule: Reflection is time investment with 800% ROI. Always worth it.
Rationalization 3: "Partial Completion is Good Enough"
Example: Plan requires 16 tasks, agent completes 8 high-quality tasks, declares success based on quality not quantity
COUNTER:
- ❌ NEVER confuse quality with completeness
- ✅ High-quality partial delivery ≠ complete delivery
- ✅ User asked for 16 tasks, not "best 8 tasks"
- ✅ Scope gaps are gaps regardless of quality delivered
Rule: Quality AND quantity both matter. Track both separately.
Rationalization 4: "I Already Know the Gaps, No Need to Document"
Example: Agent mentally aware of incomplete work but doesn't document it, commits with "complete" claim anyway
COUNTER:
- ❌ NEVER skip gap documentation because you're "aware"
- ✅ Mental awareness ≠ actionable documentation
- ✅ Gaps not documented = gaps not addressed = gaps become issues
- ✅ Documenting forces acknowledgment and planning
Rule: If gap exists, document it. Mental awareness insufficient.
Rationalization 5: "User Didn't Notice, So It's Fine"
Example: Agent ships work with gaps, user doesn't immediately comment, agent assumes gaps acceptable
COUNTER:
- ❌ NEVER assume silence = acceptance
- ✅ User may not notice gaps immediately (detailed review takes time)
- ✅ Gaps discovered later damage credibility more than gaps acknowledged upfront
- ✅ Professional responsibility to disclose gaps proactively
Rule: Disclose gaps before user discovers them. Builds trust.
Rationalization 6: "Reflection Might Reveal I Need More Work (Avoid It)"
Example: Agent subconsciously avoids reflection because it might require redoing work
COUNTER:
- ❌ NEVER avoid reflection to avoid work
- ✅ Avoidance behavior = knowing something's wrong but not checking
- ✅ Gaps exist whether you reflect or not (reflection just reveals them)
- ✅ Better to discover gaps early (fixable) than late (credibility damage)
Rule: If you're avoiding reflection, that's WHY you need it most.
The Reflection Protocol (7 Phases)
Phase 1: Original Plan Analysis
Objective: Understand what was actually requested
Process:
1. Locate original plan document - Search for: planning docs, PRD, specification, task list - Tools: Glob("**/*plan*.md"), Grep("## Phase", "### Task") 2. Read plan COMPLETELY - Count: Total tasks, phases, hours estimated - Parse: Each task's deliverables, acceptance criteria - Tool: Read (entire plan, don't skim) 3. Extract requirements - Deliverables: What files/docs should exist? - Quality criteria: What standards specified? - Dependencies: What must be done before what? - Time budget: How much time allocated? 4. Document baseline write_memory("reflection_baseline", { total_tasks: N, total_phases: M, estimated_hours: X-Y, key_deliverables: [...] })
Output: Complete understanding of original scope
Duration: 10-15 minutes
Phase 2: Delivered Work Inventory
Objective: Catalog what was actually completed
Process:
1. List all commits made - Tool: Bash("git log --oneline --since='session_start'") - Parse: Commit messages for deliverables 2. Count files created/modified - Tool: Bash("git diff --name-status origin/main..HEAD") - Categorize: New files, modified files, deleted files 3. Measure lines added - Tool: Bash("git diff --stat origin/main..HEAD") - Calculate: Total lines added, per file 4. Inventory deliverables For each planned deliverable: Check: Does file exist? Check: Does content match requirements? Classify: COMPLETE, PARTIAL, NOT_DONE 5. Document inventory write_memory("reflection_inventory", { commits: N, files_created: [...], files_modified: [...], lines_added: X, deliverables_complete: [...], deliverables_partial: [...], deliverables_missing: [...] })
Output: Complete accounting of delivered work
Duration: 10-15 minutes
Phase 3: Gap Identification (100+ Sequential Thoughts)
Objective: Systematically identify ALL gaps between plan and delivery
Process:
Use Sequential MCP for structured analysis: 1. Initialize reflection (thoughts 1-10) - Recall plan scope - Recall delivered work - Set up comparison framework 2. Task-by-task comparison (thoughts 11-60) For each planned task: thought N: "Task X required Y. I delivered Z. Gap analysis: ..." thought N+1: "Why did I skip/modify this task? Rationalization: ..." 3. Quality dimension analysis (thoughts 61-80) - Testing methodology gaps - Validation gaps - Verification gaps - Documentation completeness gaps 4. Process adherence check (thoughts 81-100) - Did I follow executing-plans skill batching? - Did I use recommended sub-skills? - Did I apply Shannon principles to Shannon work? - Did I wait for user feedback when uncertain? 5. Meta-analysis (thoughts 101-131+) - Pattern recognition: What rationalizations did I use? - Self-awareness: Am I still rationalizing in this reflection? - Credibility check: Did I overclaim in commits/docs? - Solution space: What needs fixing vs what's acceptable? Minimum 100 thoughts, extend to 150+ if complex project
Output: Comprehensive gap catalog with root cause analysis
Duration: 20-30 minutes
Phase 4: Rationalization Detection
Objective: Identify where you rationalized away work
Common Rationalization Patterns:
1. "Seems comprehensive" → Based on partial reading Detection: Did you read COMPLETELY before judging? 2. "Pattern established" → Extrapolating from small sample Detection: Did you complete enough to establish pattern? (usually need 5+ examples, not 3) 3. "Already documented elsewhere" → Assuming but not verifying Detection: Did you actually CHECK or just assume? 4. "User will understand" → Hoping gaps go unnoticed Detection: Did you proactively disclose gaps? 5. "Close enough to target" → Percentage substitution Detection: 716 lines ≠ 3,500 lines (20% ≠ 100%) 6. "Quality over quantity" → Justifying incomplete scope Detection: User asked for quantity (16 skills) not "best quality 3 skills"
For Each Rationalization Found:
Document: - What I told myself - What I actually did - What plan required - Gap size - Whether fixable
Output: Rationalization inventory with honest labeling
Duration: 10 minutes
Phase 5: Completion Percentage Calculation
Objective: Quantify actual completion honestly
Algorithm:
1. Score each task: COMPLETE: 100% (fully met requirements) PARTIAL: 50% (significant work but incomplete) NOT_DONE: 0% (not started or minimal work) 2. Calculate weighted completion: total_points = Σ(task_score × task_weight) max_points = Σ(100% × task_weight) completion_percentage = (total_points / max_points) × 100 3. Validate against time investment: time_spent / total_estimated_time should ≈ completion_percentage If mismatch >20%: investigate (either underestimated or overclaimed) 4. Compare claims vs reality: claimed_completion (from commits/docs) actual_completion (calculated above) discrepancy = claimed - actual If discrepancy >10%: CRITICAL (misleading claims) If discrepancy 5-10%: MODERATE (minor overclaim) If discrepancy <5%: ACCEPTABLE (honest assessment)
Output: Honest completion percentage + discrepancy analysis
Duration: 10 minutes
Phase 6: Critical vs Non-Critical Gap Classification
Objective: Prioritize gaps by impact
Classification:
CRITICAL (Must Fix): - Testing methodology flaws (undermines validation claims) - Incomplete major deliverables (e.g., README 20% of target) - Broken functionality (hooks untested, might not work) - Misleading claims in commits (credibility issue) HIGH (Should Fix): - Missing planned components (13 skills not enhanced) - Format deviations from plan (consolidated vs individual) - Verification steps skipped (end-to-end testing) MEDIUM (Nice to Fix): - Documentation link validation - Additional examples beyond minimum - Enhanced formatting or structure LOW (Optional): - Minor wording improvements - Supplementary documentation - Future enhancement ideas
Output: Prioritized gap list with fix estimates
Duration: 10 minutes
Phase 7: Honest Reporting & Recommendations
Objective: Present findings to user with integrity
Report Structure:
# Honest Reflection: [Project Name] ## Claimed Completion [What you claimed in commits, docs, handoffs] ## Actual Completion - Tasks: X/Y (Z%) - Weighted: W% - Time: A hours / B estimated ## Gaps Discovered (N total) ### Critical Gaps (M gaps) 1. [Gap description] - Impact: [credibility/functionality/quality] - Fix effort: [hours] - Priority: CRITICAL ### High Priority Gaps (P gaps) [List...] ### Medium/Low Gaps (Q gaps) [Summary...] ## Rationalization Patterns Detected 1. [Rationalization you used] - Pattern matches: [Shannon anti-rationalization pattern] - Why it's a rationalization: [explanation] ## Recommendations **Option A: Complete All Remaining Work** - Remaining tasks: [list] - Estimated time: [hours] - Outcome: Fulfills original plan scope 100% **Option B: Fix Critical Gaps Only** - Critical fixes: [list] - Estimated time: [hours] - Outcome: Addresses credibility/functionality issues **Option C: Accept As-Is With Honest Disclosure** - Update handoff: Acknowledge gaps honestly - Document: Remaining work as future enhancement - Outcome: Maintains credibility via transparency ## User Decision Required [Present options clearly, wait for choice]
Output: Comprehensive honest report
Duration: 15-20 minutes
Detailed Methodology (From 131-Thought Analysis)
Gap Detection Techniques
1. Plan-Delivery Comparison Matrix
For each planned task: Read plan requirement Check delivered artifacts Compare: - Deliverable exists? (YES/NO) - Deliverable complete? (100%/50%/0%) - Quality matches plan? (meets criteria / partial / below) Document gap if <100% complete
2. Requirement Tracing
REQUIRED directives in plan: - Search for: "REQUIRED", "MUST", "mandatory" - Extract each requirement - Verify each requirement fulfilled - Flag any unfulfilled REQUIRED items as CRITICAL gaps
3. Assumption Detection
Look for your own statements like: - "Seems comprehensive" → Based on what evidence? - "Pattern established" → How many examples? (need 5+, not 3) - "Good enough" → Compared to what standard? - "User will understand" → Did you verify or assume? Each assumption is potential gap until verified
4. Time-Scope Alignment Check
If plan estimated 20 hours total: - 10 hours worked = should be ~50% complete - If claiming >60% complete: investigate overclaim - If claiming <40% complete: investigate inefficiency Time spent / time estimated ≈ scope completed Significant mismatch = either estimation wrong or completion wrong
5. Testing Methodology Validation
For each test claiming behavioral improvement: - Did RED and GREEN use SAME input? - If different inputs: INVALID test (can't compare) - If same input, different output: Valid behavioral change - If same input, same output: No behavioral change (educational only) Validate methodology before accepting test results
6. Shannon Principle Self-Application
Did you follow Shannon principles on Shannon work? - 8D Complexity Analysis: Did you analyze the plan's complexity? - Wave-Based Execution: Did you use waves if complex? - NO MOCKS Testing: Did you test with real sub-agents? - FORCED_READING: Did you read ALL files completely? - Context Preservation: Did you checkpoint properly? Violating Shannon principles while enhancing Shannon = credibility gap
The 100+ Thought Reflection Process
Thoughts 1-20: Plan Understanding
- What was the original plan?
- How many total tasks/phases?
- What were key deliverables?
- What standards were specified?
- What time budget allocated?
Thoughts 21-40: Delivery Inventory
- What files did I create?
- What files did I modify?
- How many lines added?
- What commits made?
- What claims in commits?
Thoughts 41-70: Gap Identification
- Task-by-task comparison (plan vs delivered)
- Which tasks complete? Partial? Not done?
- What's the percentage completion honestly?
- Are there missing deliverables?
- Did I read all required source files?
Thoughts 71-90: Rationalization Analysis
- What assumptions did I make?
- When did I proceed without user confirmation?
- What shortcuts did I take?
- Did I optimize for my convenience vs plan requirements?
- What rationalizations match Shannon anti-patterns?
Thoughts 91-110: Quality Verification
- Were tests methodologically sound?
- Did I validate what I claimed to validate?
- Are there untested components?
- Did I verify vs assume?
- What verification steps were skipped?
Thoughts 111-131+: Solution Development
- What are critical gaps vs nice-to-fix?
- How much work to complete remaining scope?
- What's minimum to address credibility issues?
- Should I fix gaps now or document for later?
- What options to present to user?
Minimum: 100 thoughts Typical: 120-150 thoughts for thorough analysis Complex Projects: 150-200+ thoughts
Validation Checklist
Before concluding reflection, verify:
Completeness: ☐ Read entire original plan (every task, every requirement) ☐ Inventoried all delivered work (files, commits, lines) ☐ Compared EVERY task in plan to delivery ☐ Calculated honest completion percentage ☐ Identified ALL gaps (not just obvious ones)
Quality: ☐ Examined testing methodology validity ☐ Checked if validation claims are supported ☐ Verified assumptions vs confirmations ☐ Assessed if I followed Shannon principles
Honesty: ☐ Acknowledged rationalizations made ☐ Admitted where I fell short ☐ Didn't minimize or justify gaps ☐ Calculated actual completion without bias
Actionability: ☐ Classified gaps (critical/high/medium/low) ☐ Estimated fix effort for each gap ☐ Presented clear options to user ☐ Ready to act on user's choice
Output Template
# Honest Reflection: [Project Name] **Reflection Date**: [timestamp] **Sequential Thoughts**: [count] (minimum 100) **Reflection Duration**: [minutes] ## Executive Summary **Claimed Completion**: [what you said in commits] **Actual Completion**: [calculated percentage] **Discrepancy**: [gap between claim and reality] **Assessment**: [HONEST / OVERCLAIMED / UNDERCLAIMED] ## Original Plan Scope **Total Tasks**: [number] **Phases**: [number] **Estimated Time**: [hours] **Key Deliverables**: [list] ## Delivered Work **Tasks Completed**: [number] ([percentage]%) **Tasks Partial**: [number] **Tasks Not Done**: [number] **Time Spent**: [hours] ([percentage]% of estimate) **Artifacts Created**: - [list of files with line counts] **Commits Made**: [number] ## Gaps Discovered **Total Gaps**: [number] ### CRITICAL (Must Address) 1. [Gap name] - Requirement: [what plan specified] - Delivered: [what actually done] - Impact: [why critical] - Fix Effort: [hours] ### HIGH Priority [List...] ### MEDIUM/LOW Priority [Summary...] ## Rationalization Patterns **Rationalizations I Used**: 1. "[Exact rationalization quote]" - Matches anti-pattern: [Shannon pattern] - Reality: [what should have been done] ## Testing Methodology Issues [Any test validity problems discovered] ## Honest Completion Assessment **Weighted Completion**: [percentage]% ([calculation method]) **Time Alignment**: [hours spent] / [hours estimated] = [percentage]% **Validation**: Time% ≈ Completion% ? [YES/NO] ## Recommendations **Option A: Complete Remaining Work** - Remaining: [list of tasks] - Time: [hours] - Outcome: [100% scope fulfillment] **Option B: Fix Critical Gaps** - Critical: [list] - Time: [hours] - Outcome: [addresses key issues] **Option C: Accept & Document** - Action: Update docs honestly - Outcome: [maintains credibility via transparency] **My Recommendation**: [A/B/C with reasoning] ## User Decision Required [Clear question about what to do next]
Integration with Shannon Components
With executing-plans Skill
Trigger Point: After completing batch, before claiming phase complete
executing-plans: "Batch 3 complete, ready for feedback" ↓ BEFORE user feedback: honest-reflections: "Did I actually complete Batch 3 per plan?" ↓ If gaps found: Report gaps WITH batch results (transparent) If no gaps: Proceed to user feedback
With wave-orchestration Skill
Trigger Point: After wave synthesis, before declaring wave complete
wave-orchestration: "Wave 3 synthesis complete" ↓ BEFORE marking wave_3_complete: honest-reflections: "Did all Wave 3 agents deliver per plan?" ↓ Check: Deliverables, tests, integration, NO MOCKS compliance
With using-shannon Meta-Skill
Trigger Point: Any completion claim
using-shannon: Enforces quality standards ↓ Before ANY "complete" claim: honest-reflections: MANDATORY checkpoint ↓ Prevents premature completion declarations
Success Criteria
Reflection succeeds when:
✅ Minimum 100 thoughts completed (Sequential MCP) ✅ All gaps identified (no rationalization: "I'm sure there are no more") ✅ Honest percentage calculated (not rounded up favorably) ✅ Critical gaps prioritized (not all weighted equally) ✅ Rationalizations acknowledged (not justified) ✅ Options presented clearly to user (not assumed) ✅ Report saved to Serena (learning for future)
Reflection fails when:
❌ Thoughts <100 (incomplete systematic analysis) ❌ Glossing over gaps (minimization, justification) ❌ Claiming "mostly complete" without calculation ❌ Not presenting clear options to user ❌ Proceeding without user decision ❌ Rationalizing the reflection itself ("don't need full 100 thoughts")
Examples
Example 1: Detecting Partial Skill Enhancement
Scenario: Plan required enhancing 16 skills, agent enhanced 3
Reflection Process (thought 23):
"Plan Task 2-15: Audit remaining 14 skills. It says 'read skill completely' and '20-30 minutes per skill'. I read first 200 lines of 3 skills, spent 30 minutes total, concluded 'no gaps'. But 14 skills × 1,000 avg lines = 14,000 lines. I only read 600 lines (4%). I didn't actually audit - I sampled and assumed. Gap: 96% of audit reading not performed."
Detection Method: Time-scope alignment
- Plan: 4-7 hours for Task 2-15
- Actual: 30 minutes
- Ratio: 7-14% of planned time
- Conclusion: Incomplete audit
Example 2: Detecting Testing Methodology Flaw
Scenario: Agent claims "19% improvement" from RED/GREEN test
Reflection Process (thought 30):
"I tested spec-analysis with RED (inventory system) vs GREEN (recipe platform). Different inputs produced different outputs (0.47 vs 0.38). But this doesn't prove walkthrough improves accuracy - it just proves different specs have different complexity. To validly test behavioral change, I need SAME input for both RED and GREEN. My test methodology is flawed."
Detection Method: Test validity analysis
- Check: Same input for RED and GREEN?
- Result: No (different specs used)
- Conclusion: Invalid comparison, can't claim behavioral improvement
Example 3: Detecting Scope Substitution
Scenario: Plan wanted sh_discover_skills guide, agent created sh_check_mcps guide
Reflection Process (thought 41):
"Plan Task 26 specifies 'sh_discover_skills guide'. I created sh_check_mcps guide instead. Both exist as commands, but plan explicitly listed sh_discover_skills. I substituted without approval because sh_check_mcps seemed more important. That's my judgment overriding plan specification."
Detection Method: Exact requirement matching
- Plan requirement: Specific command name
- Delivery: Different command
- Conclusion: Substitution without authorization
Common Pitfalls
Pitfall 1: Stopping Reflection at 50-70 Thoughts
Problem: Agent thinks "I've found the main gaps, 70 thoughts is enough"
Why It Fails: Last 30-50 thoughts often reveal deepest gaps (meta-level patterns, principle violations, methodology flaws)
Solution: Continue to minimum 100, extend to 150+ if still discovering gaps
Pitfall 2: Rationalizing During Reflection
Problem: Reflection becomes justification exercise ("here's WHY gaps are acceptable")
Why It Fails: Reflection goal is IDENTIFY gaps, not JUSTIFY them
Solution: Label rationalizations as rationalizations, don't defend them
Pitfall 3: Comparison Shopping (Minimizing Gaps)
Problem: "Only 13 skills missing, that's not that many" or "50% completion is passing grade"
Why It Fails: Minimization is gap avoidance
Solution: State gaps factually without minimization. Let user judge severity.
Pitfall 4: Not Reading Source Plans Completely
Problem: Skim plan, assume you remember requirements, miss specific details
Why It Fails: Plans have specific requirements (file names, line counts, exact deliverables) that skimming misses
Solution: Read ENTIRE plan during Phase 1 of reflection. Every line.
Performance Benchmarks
| Project Complexity | Reflection Time | Thoughts Required | Gaps Typically Found |
|---|---|---|---|
| Simple (1-5 tasks) | 15-20 min | 50-80 | 2-5 gaps |
| Moderate (5-15 tasks) | 20-30 min | 100-120 | 5-15 gaps |
| Complex (15-30 tasks) | 30-45 min | 120-150 | 15-30 gaps |
| Critical (30+ tasks) | 45-60 min | 150-200+ | 30-50+ gaps |
This project: 38 tasks (Complex-Critical) → 30-45 min reflection, 131 thoughts, 27 gaps found
Alignment: ✅ Metrics align with complexity (thorough reflection appropriate for scope)
Validation
How to verify reflection executed correctly:
-
Check thought count:
- Minimum 100 thoughts ✅
- Extended if still finding gaps ✅
-
Check completeness:
- Read entire plan ✅
- Inventoried all deliverables ✅
- Compared every task ✅
-
Check honesty:
- Acknowledged rationalizations ✅
- No minimization of gaps ✅
- Honest percentage calculated ✅
-
Check actionability:
- Gaps prioritized ✅
- Options presented clearly ✅
- User decision requested ✅
References
- Sequential MCP: For 100+ structured thoughts
- systematic-debugging skill: Root cause analysis of gaps
- confidence-check skill: Validate claims made
- executing-plans skill: Batching protocol (when to reflect)
Version: 1.0.0 Created: 2025-11-08 (from Shannon V4.1 enhancement reflection) Author: Shannon Framework Team Status: Core PROTOCOL skill for quality assurance