Claude-skill-registry experiment-analyzer
Analyze completed growth experiment results, validate hypotheses, generate insights, and suggest follow-up experiments. Use when experiments are completed, when the user asks about results or learnings, or when discussing what to do next based on experiment outcomes.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/experiment-analyzer" ~/.claude/skills/majiayu000-claude-skill-registry-experiment-analyzer && rm -rf "$T"
skills/data/experiment-analyzer/SKILL.mdExperiment Analyzer Skill
Analyze completed growth experiments, extract insights, and drive continuous learning.
When to Activate
This skill should activate when:
- User marks experiment as "completed"
- User asks "what did we learn?"
- User mentions "results", "outcomes", or "analysis"
- User asks "what should we do next?"
- User wants to compare multiple experiments
- User asks about experiment success rates
Analysis Framework
1. Result Classification
Win (Positive + Significant)
- Result is better than baseline
- Statistical significance ≥ 95%
- Change is meaningful (usually ≥5%)
Loss (Negative + Significant)
- Result is worse than baseline
- Statistical significance ≥ 95%
- Change is meaningful
Inconclusive
- Statistical significance < 95%
- Not enough data to make decision
- Sample size may be insufficient
Neutral
- Minimal change (< ±2%)
- No meaningful impact either way
- May indicate hypothesis was off
2. Hypothesis Validation
Compare original hypothesis to results:
Hypothesis Components:
- Proposed change → Was it implemented as planned?
- Target audience → Did we reach the right users?
- Expected outcome → Did we hit the target?
- Rationale → Was our reasoning correct?
Validation Questions:
- Did we achieve the expected outcome? (Yes/No/Partially)
- Was the underlying assumption correct?
- What surprised us?
- What would we do differently?
3. ICE Score Retrospective
Compare predicted vs actual:
Impact Score Validation:
- Predicted Impact: [original score]
- Actual Impact: [calculate based on results]
- Delta: [difference]
- Learning: Was our impact prediction accurate?
Confidence Score Validation:
- Predicted Confidence: [original score]
- Outcome: [win/loss/inconclusive]
- Learning: Was our confidence justified?
Ease Score Validation:
- Predicted Ease: [original score]
- Actual Time: [if tracked]
- Learning: Was implementation as easy as expected?
4. Insight Generation
Key Questions:
- What worked? Specific elements that drove success
- What didn't work? Elements that failed or harmed metrics
- What was surprising? Unexpected findings
- What patterns emerge? Connections to other experiments
- What new questions arise? Areas to investigate further
Secondary Metrics:
- Review all secondary metrics tracked
- Look for unintended positive effects
- Watch for negative side effects
- Consider holistic impact
5. Follow-up Experiment Suggestions
Based on the outcome, suggest 2-3 follow-up experiments:
For Wins:
- Scale: Roll out to 100% of users
- Amplify: Make the winning element more prominent
- Extend: Apply pattern to related areas
- Optimize: Test variations to improve further
For Losses:
- Pivot: Try alternative approach to same problem
- Investigate: Run research to understand why
- Revert: Document and move on
- Learn: Apply learnings to future experiments
For Inconclusive:
- Re-run: Increase sample size or duration
- Simplify: Test smaller version to isolate variable
- Segment: Test with specific user segments
- Refine: Adjust hypothesis based on early signals
Analysis Process
Step 1: Load and Validate
1. Read experiment JSON from completed/archived folder 2. Verify results data exists: - Primary metric - Baseline value - Result value - Statistical significance - Sample size - Duration 3. Check if hypothesis is documented 4. Review ICE scores
Step 2: Calculate Key Metrics
Change Percentage = ((Result - Baseline) / Baseline) × 100 Result Classification: - IF change% > 2% AND significance >= 95% → Win - IF change% < -2% AND significance >= 95% → Loss - IF significance < 95% → Inconclusive - IF abs(change%) < 2% → Neutral
Step 3: Generate Insights
1. Classify result (Win/Loss/Inconclusive/Neutral) 2. Validate hypothesis against results 3. Review ICE score predictions 4. Extract key learnings 5. Identify surprising findings 6. Check secondary metrics 7. Look for patterns across related experiments
Step 4: Create Follow-up Ideas
1. Based on result type, brainstorm 2-3 follow-ups 2. For each follow-up: - Draft hypothesis - Explain rationale (reference current learnings) - Suggest category - Provide preliminary ICE estimate 3. Prioritize follow-ups by potential impact
Step 5: Generate Report
1. Create markdown analysis report 2. Include: - Summary (result classification, key numbers) - Hypothesis validation - ICE score retrospective - Key insights (bulleted list) - Secondary metrics review - Recommendations - Follow-up experiment ideas 3. Save to experiments/archive/[id]_analysis.md 4. Update experiment JSON with learnings
Analysis Output Template
# Experiment Analysis: [Title] **Date:** [Analysis date] **Experiment ID:** [id] **Status:** [Win/Loss/Inconclusive/Neutral] ✓/✗/?/○ ## Summary - **Primary Metric:** [metric name] - **Baseline:** [baseline value] - **Result:** [result value] - **Change:** [+/-X%] - **Statistical Significance:** [XX%] - **Sample Size:** [count] - **Duration:** [days] ## Hypothesis Validation ### Original Hypothesis [Full hypothesis statement] ### Validation - **Expected Outcome:** [what we expected] - **Actual Outcome:** [what happened] - **Hypothesis Validated:** [Yes/No/Partially] **Analysis:** [Explanation of whether and why hypothesis was validated] ## ICE Score Retrospective | Component | Predicted | Actual/Assessment | Accuracy | |-----------|-----------|------------------|----------| | Impact | [score] | [calculate from results] | [good/overestimated/underestimated] | | Confidence | [score] | [based on outcome] | [justified/overconfident/underconfident] | | Ease | [score] | [based on actual effort] | [accurate/harder/easier] | **Learnings for Future Scoring:** - [What we learned about predicting impact] - [What we learned about confidence] - [What we learned about ease] ## Key Insights 1. **[Primary insight]** - [Explanation with data] 2. **[Secondary insight]** - [Explanation] 3. **[Surprising finding]** - [What we didn't expect] ## Secondary Metrics | Metric | Change | Interpretation | |--------|--------|----------------| | [metric 1] | [+/-X%] | [Good/Bad/Neutral] | | [metric 2] | [+/-X%] | [Good/Bad/Neutral] | **Side Effects:** - Positive: [Any unexpected positive impacts] - Negative: [Any unexpected negative impacts] ## Recommendations ### Immediate Actions - [ ] [Action item 1] - [ ] [Action item 2] ### Strategic Implications [Broader implications for product/growth strategy] ## Follow-up Experiment Ideas ### 1. [Experiment Title] **Category:** [category] **Hypothesis:** [Full hypothesis following template] **Rationale:** [Why this follow-up based on current learnings] **Preliminary ICE:** - Impact: [score] - [reasoning] - Confidence: [score] - [reasoning] - Ease: [score] - [reasoning] - **Total: [score]** --- ### 2. [Experiment Title] [Repeat format] --- ### 3. [Experiment Title] [Repeat format] ## Related Experiments [List any related experiments and their outcomes for pattern recognition] ## Notes [Any additional context, edge cases, or considerations]
Cross-Experiment Analysis
When user asks to analyze multiple experiments:
Metrics to Calculate:
- Success Rate: % of wins out of completed experiments
- Category Performance: Which funnel stages have best win rate?
- ICE Score Accuracy: How well do high-ICE experiments perform?
- Average Impact: What's the typical metric improvement?
- Cycle Time: Average days from backlog → completed
Pattern Recognition:
- Which types of experiments succeed most?
- Which audience segments respond best?
- Which testing methods are most reliable?
- What confidence levels actually predict success?
Portfolio View:
# Experiment Portfolio Analysis ## Overview - Total Experiments: [count] - Completed: [count] - Win Rate: [X%] - Average Change: [+X%] ## By Category | Category | Experiments | Win Rate | Avg Impact | |----------|-------------|----------|------------| | Acquisition | [count] | [X%] | [+X%] | | Activation | [count] | [X%] | [+X%] | | Retention | [count] | [X%] | [+X%] | | Revenue | [count] | [X%] | [+X%] | | Referral | [count] | [X%] | [+X%] | ## ICE Score Performance - Experiments with ICE > 500: [X% win rate] - Experiments with ICE 300-500: [X% win rate] - Experiments with ICE < 300: [X% win rate] **Learning:** [Are high ICE scores actually better predictors?] ## Top Performers 1. [Experiment] - [+X%] change 2. [Experiment] - [+X%] change 3. [Experiment] - [+X%] change ## Key Patterns - [Pattern 1 discovered across experiments] - [Pattern 2] - [Pattern 3] ## Recommendations [Strategic recommendations based on portfolio analysis]
Integration Points
- Automatically trigger when
sets status to "completed"/experiment-update - Work with ICE scorer skill to validate predictions
- Inform hypothesis generator with learnings
- Feed into metrics calculator for portfolio analysis
Continuous Improvement
After each analysis:
- Store learnings in a knowledge base
- Update ICE scoring calibration
- Refine hypothesis templates
- Build pattern library
- Improve follow-up suggestions