Claude-skill-registry-data metric-diagnosis
Use when investigating unexpected metric changes - systematically narrows root cause through 4D segmentation, intrinsic vs extrinsic factor analysis, hypothesis testing, and North Star impact assessment
git clone https://github.com/majiayu000/claude-skill-registry-data
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/metric-diagnosis" ~/.claude/skills/majiayu000-claude-skill-registry-data-metric-diagnosis && rm -rf "$T"
data/metric-diagnosis/SKILL.mdMetric Diagnosis Workflow
Purpose
Systematically investigate why a metric changed unexpectedly (dropped or spiked) by segmenting data, distinguishing internal vs. external factors, and testing hypotheses to identify root cause. Prevents rushing to wrong conclusions and ensures evidence-based responses.
When to Use This Workflow
Use this workflow when:
- Key metrics drop or spike unexpectedly
- Leadership asks "what happened to [metric]?"
- Post-mortem analysis needed after incident
- A/B test shows unexpected patterns
- Need to validate suspected root cause
- User behavior changed suddenly without obvious reason
Skills Sequence
This workflow applies the
root-cause-diagnosis skill three times, then assesses North Star impact:
1. Root Cause Diagnosis (Phase 1) ↓ (Segment across 4 dimensions: People, Geography, Technology, Time) 2. Root Cause Diagnosis (Phase 2) ↓ (Distinguish intrinsic vs extrinsic factors) 3. Root Cause Diagnosis (Phase 3) ↓ (Build hypothesis table and test against data) 4. North Star Alignment ↓ (Assess impact on top-line metrics) OUTPUT: Narrowed scope, tested hypotheses, identified root cause, recommended actions
Required Inputs
Gather this information before starting:
Metric Change Information
- The metric that changed
- Name and definition
- Example: "Weekly Active Users (WAU)"
- Magnitude of change
- Absolute and percentage
- Example: "10,000 → 9,000 (10% drop)"
- Timeframe
- When was it noticed?
- Over what period did change occur?
- Example: "Noticed today, change happened over past week"
Data Access
- Segmented data availability
- Can you segment by user type, geography, platform?
- Are raw logs accessible?
- Comparison data
- Historical baselines
- Year-over-year seasonality
- Competitor benchmarks (if available)
Recent Activity Log
- Recent releases (past 1-4 weeks)
- Feature launches
- Bug fixes
- Infrastructure changes
- Running experiments
- A/B tests active
- Configuration changes
- External context
- Marketing campaigns
- News events
- Competitor activity
Workflow Steps
Phase 0: Data Quality Verification (5 minutes)
Before investigating, confirm the data is real:
-
Check reporting systems
- Is tracking still functional?
- Any pipeline issues?
- Data freshness correct?
-
Cross-reference sources
- Do multiple data sources show same pattern?
- Internal dashboards vs. external tools aligned?
-
Validate sample sizes
- Sufficient data for statistical significance?
- Time periods comparable?
Red flags that indicate data quality issues:
- Metric went to exactly zero
- Change happened at exact midnight
- Only visible in one data source
- Coincides with known tracking deployment
If data quality issue found: Fix tracking first, then re-investigate if problem persists.
If data is valid: Proceed to Phase 1.
Phase 1: Narrow the Scope (15-20 minutes)
Use the
skill - 4 Dimension Segmentationroot-cause-diagnosis
Ask clarifying questions across all four dimensions:
Dimension 1: People (User Segments)
Questions to ask:
- Does it affect all users equally?
- Specific demographics (age, gender, location)?
- New vs. returning users?
- Free vs. paid users?
- Power users vs. casual users?
- Specific cohorts or user types?
Request data segmentation by user attributes.
Dimension 2: Geography
Questions to ask:
- Specific countries or regions affected?
- Certain cities?
- Urban vs. rural?
- Climate zones?
- Time zones?
Request data segmentation by location.
Dimension 3: Technology (Platform)
Questions to ask:
- iOS vs. Android?
- Web vs. mobile app?
- Desktop vs. mobile web?
- Specific browser or OS versions?
- Device types?
- Network types (WiFi vs. cellular)?
Request data segmentation by platform/device.
Dimension 4: Time
Questions to ask:
- When exactly did it start?
- Gradual decline or sudden drop?
- Day of week patterns?
- Time of day patterns?
- Recurring or one-time?
- Seasonal effects?
Request time-series data with granular breakdown.
Output of Phase 1:
Narrow problem statement from:
- ❌ "Usage is down"
To:
- ✓ "iOS usage among teens in US dropped 10% starting Sept 1st"
Document:
## Narrowed Scope **Original statement:** [Broad problem] **Narrowed statement:** [Specific problem] **Affected segments:** - People: [Which user groups] - Geography: [Which locations] - Technology: [Which platforms] - Time: [When and pattern] **Unaffected segments:** - [What stayed stable - equally important]
Phase 2: Generate Hypotheses (15-20 minutes)
Use the
skill - Intrinsic vs. Extrinsic Factor Analysisroot-cause-diagnosis
Brainstorm potential causes in both categories:
Intrinsic Factors (Internal Changes)
Engineering & Product:
- Recent code releases (check deploy logs)
- Bug introductions
- Feature launches by any team
- A/B experiments started/stopped
- Algorithm or ranking changes
- Performance degradations
Check with:
- Engineering team (deploys, errors, performance)
- Other product teams (their launches)
- Data team (tracking changes)
Operations & Business:
- Pricing changes
- Policy updates
- Support process changes
- Marketing campaign changes
Check with:
- Operations team
- Marketing team
- Customer success team
Extrinsic Factors (External Events)
Competition:
- Competitor product launches
- Competitor pricing changes
- Competitor marketing campaigns
Market:
- Economic shifts
- Industry trends
- Regulatory changes
- Platform policy changes (iOS, Android, etc.)
Calendar/Seasonal:
- Holidays
- School schedules
- Weather patterns
- Cultural events
News/PR:
- News coverage (positive or negative)
- Social media trends
- External events affecting user behavior
Check with:
- Sales team (competitive intelligence)
- Customer success (user feedback)
- Marketing (market trends)
- Industry news sources
Output of Phase 2:
List of 4-6 hypotheses (mix of intrinsic and extrinsic):
## Hypothesis List **Intrinsic (Internal):** 1. [Hypothesis name]: [Description] 2. [Hypothesis name]: [Description] 3. [Hypothesis name]: [Description] **Extrinsic (External):** 4. [Hypothesis name]: [Description] 5. [Hypothesis name]: [Description] 6. [Hypothesis name]: [Description]
Phase 3: Test Hypotheses (20-30 minutes)
Use the
skill - Hypothesis Table Methodroot-cause-diagnosis
Step 3.1: Create Hypothesis Table
Columns: Data points observed (from Phase 1) Rows: Potential causes (from Phase 2)
Fill each cell with predicted impact:
- ↓ = Would decrease this segment
- ↑ = Would increase this segment
- → = No effect on this segment
- ? = Uncertain
Step 3.2: Match Patterns
For each hypothesis:
- Do predictions match observations?
- If ALL predictions match → Possible cause
- If ANY predictions contradict → Rule out
Step 3.3: Request Differentiating Data
If multiple hypotheses fit:
- Identify data points that would differ between them
- Request additional segmentation
- Narrow to single most likely cause
Example Table:
| Cause | iOS ↓ | Android → | Teens ↓ | Adults → | US ↓ | EU → |
|---|---|---|---|---|---|---|
| iOS bug in latest update | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| School started (US only) | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ |
| Competitor targeted teens | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |
Pattern match: "School started (US)" fits all observations → Most likely cause
Output of Phase 3:
## Hypothesis Testing Results **Ruled out:** - [Hypothesis]: [Why ruled out] - [Hypothesis]: [Why ruled out] **Possible causes (remaining):** 1. [Most likely hypothesis] - Evidence supporting: [List] - Evidence against: [List if any] - Confidence: [High/Medium/Low] 2. [Second most likely] (if applicable) - Evidence supporting: [List] - Confidence: [High/Medium/Low] **Additional data needed to confirm:** - [Data point that would differentiate remaining hypotheses]
Phase 4: Assess North Star Impact (10 minutes)
Use the
skillnorth-star-alignment
Evaluate whether identified root cause impacts company-level metrics:
Questions:
- Does this affect North Star metrics?
- Is impact temporary or permanent?
- Does it require immediate action?
- What's the business impact (revenue, growth, retention)?
Severity classification:
Critical (Act immediately):
- Impacts revenue directly
- Affects North Star metrics substantially
- Permanent degradation if not fixed
- Competitive disadvantage
Important (Act soon):
- Impacts user experience
- Affects intermediate metrics
- May worsen over time
- Needs monitoring
Low Priority (Monitor):
- Expected variation (seasonality)
- Affects non-critical segments
- Self-resolving
- No action needed
Output of Phase 4:
## North Star Impact Assessment **Affected North Star metrics:** - [Metric]: [Impact description] - [Metric]: [Impact description] **Business impact:** - Revenue: [Direct/Indirect/None] - Growth: [Slowed/Accelerated/Neutral] - Retention: [Harmed/Improved/Neutral] **Severity:** [Critical/Important/Low Priority] **Urgency:** [Immediate/This week/Monitor]
Phase 5: Recommend Actions (10 minutes)
Based on root cause and severity, recommend next steps:
Action categories:
-
Fix (for bugs/issues)
- What needs to be fixed?
- Who owns the fix?
- Timeline for fix?
-
Mitigate (for external factors)
- What can reduce harm?
- Counter-measures available?
- Timeline for mitigation?
-
Monitor (for expected variations)
- What tracking is needed?
- Alert thresholds?
- Review cadence?
-
No Action (for acceptable changes)
- Why no action needed?
- What validates this decision?
Output document:
# Metric Diagnosis Report: [Metric Name] ## Executive Summary [One paragraph: What happened, why, what to do] ## Problem Statement - **Initial observation:** [Broad statement] - **Narrowed scope:** [Specific statement with segments] - **Magnitude:** [Absolute and % change] - **Timeframe:** [When occurred] ## Root Cause Analysis ### Narrowed Scope (Phase 1) - **Affected:** [User segments, geography, platforms, time pattern] - **Unaffected:** [What stayed stable] ### Hypotheses Tested (Phase 2-3) | Hypothesis | Category | Evidence | Conclusion | |------------|----------|----------|------------| | [Name] | Intrinsic/Extrinsic | [Supporting/Contradicting] | Ruled Out/Possible/Likely | ### Identified Root Cause **Most likely cause:** [Hypothesis name] **Evidence supporting:** 1. [Data point matching prediction] 2. [Data point matching prediction] 3. [Stakeholder confirmation] **Confidence level:** [High/Medium/Low] ### North Star Impact (Phase 4) - **Affected metrics:** [List with impact] - **Severity:** [Critical/Important/Low] - **Urgency:** [Immediate/Soon/Monitor] ## Recommended Actions ### Primary action: **[Fix/Mitigate/Monitor/No Action]** - What: [Specific action] - Who: [Owner] - When: [Timeline] - Success criteria: [How to know it worked] ### Secondary actions: - [Action 1] - [Action 2] ### Monitoring plan: - Metrics to track: [List] - Review frequency: [Daily/Weekly] - Alert thresholds: [When to escalate] ## Appendix - Hypothesis table: [Detailed] - Data sources: [Links] - Stakeholders consulted: [Names/teams]
Common Diagnostic Patterns
Pattern 1: Seasonal Variation
- Symptoms: Recurring pattern (weekly, yearly)
- Investigation: Compare to historical same period
- Action: Usually monitor, adjust baselines
- Example: School year start, holiday season
Pattern 2: Competitor Impact
- Symptoms: Sudden shift, specific demographic
- Investigation: Check competitor news, market research
- Action: Counter-strategy, feature parity
- Example: Competing app launch with targeted marketing
Pattern 3: Internal Bug
- Symptoms: Sudden drop, platform-specific
- Investigation: Check recent releases, error logs
- Action: Hotfix, rollback if needed
- Example: iOS update broke key feature
Pattern 4: Product Change Side Effect
- Symptoms: Gradual shift after release
- Investigation: Analyze feature adoption patterns
- Action: Iterate on feature, A/B test variations
- Example: Redesign changed user behavior
Pattern 5: Tracking Issue (False Alarm)
- Symptoms: Exact numbers, midnight timing, single source
- Investigation: Verify data pipeline
- Action: Fix tracking, dismiss false alarm
- Example: Analytics tag removed in deploy
Common Mistakes
| Mistake | Fix |
|---|---|
| Jumping to conclusions without segmentation | Complete Phase 1 (4D segmentation) first |
| Only considering one type of factor | Brainstorm both intrinsic AND extrinsic |
| Accepting first matching hypothesis | Test all hypotheses systematically |
| Skipping data quality check | Always verify data is valid first |
| Not consulting stakeholders | Talk to sales, CS, engineering teams |
| Immediate rollback without diagnosis | Understand cause before acting |
Success Criteria
Metric diagnosis succeeds when:
- Data quality verified
- Problem narrowed to specific segments (4 dimensions)
- 4-6 hypotheses brainstormed (intrinsic + extrinsic)
- Hypotheses tested against data systematically
- Root cause identified (or top 2-3 candidates with confidence levels)
- North Star impact assessed
- Recommended actions clear with owners and timelines
- Diagnosis document created and shared
- Stakeholders aligned on next steps
Real-World Example: Microsoft Teams Downloads Drop
Initial Report
"New app downloads down 50% from last week"
Phase 1: Narrow Scope (15 min)
Questions asked:
- Geography? → "Predominantly US and Europe"
- User type? → "Enterprise customers, not consumers"
- Platform? → "Web and desktop primarily, some mobile"
- Other metrics? → "Usage unchanged (existing users still active)"
Narrowed statement: "Enterprise customer downloads via web/desktop down 50% in US/Europe, starting last week. Existing user engagement unchanged."
Phase 2: Generate Hypotheses (15 min)
Intrinsic:
- Bug in download flow
- Recent release broke something
- A/B test reducing visibility
Extrinsic: 4. Marketing promotion ended 5. Competitor launch 6. Economic downturn affecting enterprise spending
Stakeholder consults:
- Engineering: No recent releases, no bugs reported
- Marketing: "We were running 50% off for enterprises switching from Slack"
- Sales: No competitor news, pipeline healthy
Phase 3: Test Hypotheses (20 min)
| Hypothesis | Enterprise | Consumer | US/EU | Usage Same | Downloads Only |
|---|---|---|---|---|---|
| Bug in flow | ✗ Both | ✗ Both | ✗ All | ? | ✓ |
| Promo ended | ✓ | ✗ | ✓ | ✓ | ✓ |
| Competitor | ✗ Both | ✗ Both | ✗ All | ? | ✓ |
Pattern match: "Promotion ended" fits all observations
Confirmation: Marketing confirms promotion ended last weekend
Phase 4: North Star Impact (5 min)
Root cause: End of promotional campaign
- Downloads during promo = artificially inflated baseline
- Current downloads = return to normal baseline
North Star impact:
- MAU: No impact (existing users unaffected)
- Revenue: No impact (existing customers retained)
- Severity: Low (false alarm, not actual problem)
Phase 5: Recommended Actions (5 min)
Primary action: No action needed
- Not an actual decline, just return to baseline
- Adjust baselines for future comparisons
Secondary actions:
- Update dashboards to flag promotional periods
- Establish "normal" baseline for future monitoring
- Document for future reference (prevent false alarms)
Time to diagnosis: 60 minutes
Related Skills
This workflow orchestrates these skills:
- root-cause-diagnosis (Phases 1-3) - Core investigation method
- north-star-alignment (Phase 4) - Assess strategic impact
Related Workflows
- tradeoff-decision: May use diagnosis to understand why metrics changed before deciding
- metrics-definition: Uses similar systematic approach for defining vs. diagnosing
Time Estimate
Total: 60-90 minutes
- Phase 0 (Data quality): 5 min
- Phase 1 (Narrow scope): 15-20 min
- Phase 2 (Hypotheses): 15-20 min
- Phase 3 (Test): 20-30 min
- Phase 4 (Impact): 10 min
- Phase 5 (Actions): 10 min
For complex issues, may require multiple iterations or longer investigation time.