Claude-skill-registry hypothesis-to-data-velocity

A product strategy framework that replaces lengthy strategic planning with rapid experimentation cycles. Use this when you need to make product decisions faster, validate market assumptions, or move beyond opinion-based strategy. This skill is essential when your team is stuck in lengthy planning cycles, when multiple stakeholders have conflicting ideas about direction, or when you need to demonstrate that a strategy is working (or not) within weeks instead of months.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/hypothesis-to-data-velocity" ~/.claude/skills/majiayu000-claude-skill-registry-hypothesis-to-data-velocity && rm -rf "$T"
manifest: skills/data/hypothesis-to-data-velocity/SKILL.md
source content

Overview

Most product teams waste months building "strategy" that's really just packaged intuition. The loudest voice wins, junior PMs feel powerless, and the best market feedback comes too late. This skill replaces strategic planning theater with a disciplined hypothesis-to-data loop: form a testable hypothesis, run an experiment, get results in days or weeks, and let data—not hierarchy—determine your next move.

Why this matters: In fast-moving markets, the team that learns fastest wins. Experimentation-driven product strategy is the only scalable way to make decisions scientifically instead of politically.

Use this skill when:

  • Your team spends months on strategy documents that don't change behavior
  • Stakeholders disagree on direction and you need objective resolution
  • You're launching features based on intuition and need data-backed confidence
  • You want to move product decisions from "loudest voice" to "measurable evidence"

Core Principle: Strategy Is Solved Once Hypotheses Become Data

The fundamental insight: Strategy by definition is unnecessary once you can test a hypothesis and see results. If you can state "building X will add Y users with Z conversion rate and A LTV," that's a hypothesis you test—not strategy you debate.

Traditional strategy is packaged intuition disguised as insight. It looks like analysis (Porter's Five Forces, market segmentation) but relies on whoever has the biggest title or loudest voice.

The alternative: Make your strategy a solved problem by operating faster than your competition can theorize.


The Three-Part Framework: Hypothesis → Experiment → Decision

Step 1: Form a Testable Hypothesis

Write it in this structure:

"If we [change X], we will see [metric impact] with [supporting specifics], resulting in [business outcome]."

Examples:

  • "If we simplify the KYC document upload flow for passports in Kenya, conversion will improve from 35% to 48% within the first cohort of 10,000 users."
  • "If we reduce onboarding screens from 7 to 4, daily active user retention increases by 12% for users in their first 30 days."
  • "If we add high-interest savings as a product, we'll acquire 50K users in Q2 when rates are above 4%, and reduce churn among existing users by 8%."

Key criteria for testable hypotheses:

  • Specific metric (conversion, retention, LTV, DAU)
  • Measurable threshold (not "improve," but "improve from X to Y")
  • Clear business tie-in (user acquisition, retention, revenue impact)
  • Reversible (you can undo it if wrong)

Step 2: Run the Experiment (Not a Pilot, Not a Rollout)

Use a proper experimentation tool (Statsig, Optimizely, VWO, or equivalent) that gives you:

  • Variant and control groups (this is critical—don't compare before/after, compare variant vs. control)
  • Statistical significance threshold (typically 95% confidence)
  • Timeline to significance (how many days/users until you have conclusive data)
  • Cohort analysis (don't just look at aggregated metrics; break down by user type, geography, tenure)

Critical mistake to avoid: Looking at your overall dashboard metrics (which include all users from all time periods) to make decisions. Macro metrics move for reasons outside your control (market conditions, competitor changes, seasonal shifts). Experiments isolate your change's true impact.

Example of wrong vs. right approach:

  • ❌ Wrong: "Conversions dropped from 5% to 4.2% this month, so our onboarding change hurt us."
  • ✓ Right: "In the experiment, variant users had 4.8% conversion vs. 5.1% in control (p-value 0.34, not significant). The overall 4.2% drop is due to Bitcoin price movement, not our product change."

Step 3: Make the Decision (Keep, Iterate, or Abandon)

Once you have statistical significance (typically 2-4 weeks for most consumer features):

  1. If metric moved positive and significant: Ship it broadly, measure impact at scale, prepare next hypothesis
  2. If metric moved negative or flat: Kill it, learn why it failed, form new hypothesis
  3. If inconclusive (not enough users, borderline stats): Run longer or try a different variable

Expected cadence: 2-4 week cycles for most features, allowing you to run 12-13 experiments per quarter. This compounds learning velocity dramatically.


How This Changes Decision-Making Authority

Once you've established experimentation as your decision mechanism:

From: "Here's my opinion on what we should do" (loudest voice wins)
To: "Here's the data from 6 repeated tests across these user segments"

When someone proposes a feature or strategy, the response becomes: "That's a hypothesis. Let's design an experiment to test it. We'll know in 3 weeks."

This is profoundly empowering for junior PMs. You're no longer competing in a politics game—you're competing on whose hypothesis was more accurate.


Operational Implementation: The Statsig Dashboard Pattern

Once you have proper experimentation tooling in place:

  1. Daily standup metric check: Look at active experiments' dashboards

    • Which variants are winning?
    • Which are approaching statistical significance?
    • Any unexpected side effects in other metrics?
  2. Weekly decision meeting: For experiments hitting significance

    • Winner? Launch broadly, prepare migration plan
    • Loser? Document learnings, form next hypothesis
    • Inconclusive? Extend runway or pivot
  3. Monthly learning review: Which hypotheses were accurate? Which failed? What patterns emerge?

This replaces lengthy strategic planning meetings with data-driven decision cycles.


Real Example: KYC Conversion (Binance)

Context: Conversion dropped from near-100% (unregulated) to 2% (full regulatory KYC required). Operating in 200 countries with different document requirements.

Wrong approach: "We need a strategy for global KYC." (Takes 6 months, produces 50-page document, implementation misses real problems)

Hypothesis-to-data approach:

  1. Build spreadsheet of Top 50 countries × Top 10 document types = 500 cells
  2. Measure conversion for each combination
  3. Identify lowest-converting cells: "Driver's license in Kazakhstan at 12%, Passport in Kenya at 18%"
  4. Form hypotheses: "If we switch to Vendor B's SDK for Kazakhstan licenses, conversion improves to 25%"
  5. Test with new cohort of Kazakhstan users
  6. If it works, roll out; if not, try different vendor or imaging tech
  7. Repeat for each cell

Result: Systematic improvement across all 500 scenarios instead of guessing which countries matter most.


Common Pitfalls

1. Looking at aggregated metrics instead of cohort breakdowns

  • Problem: Overall conversion down 0.8%, so you kill your feature
  • Reality: Your feature works great for new users (up 4%) but old users regressed (down 5%)
  • Fix: Always analyze by user cohort, geography, device, acquisition channel

2. Insufficient sample size or runtime

  • Problem: "We ran the experiment for a week, we have a winner"
  • Reality: Need 2-4 weeks for statistical significance in most cases
  • Fix: Use your experimentation tool's confidence calculator; don't ship until you hit your significance threshold

3. Testing too many variables at once

  • Problem: You change the copy, the color, the button size, and it moved—but which change caused it?
  • Reality: Can't identify what to keep, can't learn reliably
  • Fix: One variable per experiment (occasionally two if they're tightly linked)

4. Shipping before broad rollout metrics stabilize

  • Problem: Your experiment shows 5% lift, you roll out to all users, impact disappears
  • Reality: Selection bias in experiment (early adopters behave differently) or you're hitting infrastructure limits
  • Fix: Do staged rollout (10% → 25% → 50% → 100%), monitor metrics at each stage

5. Confusing "test all ideas" with "experiment discipline"

  • Problem: Running 50 small experiments but none inform each other or build toward a strategy
  • Reality: Experimentation without hypothesis clarity becomes random feature churn
  • Fix: Group experiments by theme ("Improve onboarding retention," "Reduce payment friction"), each informs the next

How to Establish This in Your Organization

Phase 1: Tool + Infrastructure (Weeks 1-4)

  • Select and implement experimentation platform (Statsig, Optimizely, VWO, or open-source equivalent)
  • Train engineers on proper experiment setup (variant/control, randomization, metrics)
  • Set confidence threshold (typically 95%) and minimum sample size requirements

Phase 2: Hypothesis Discipline (Weeks 5-8)

  • Require all major feature work to start with written hypothesis (template: "If we [change], we'll see [metric] move from X to Y")
  • Have PM and eng lead review hypotheses before implementation
  • Document every experiment result (winner, loser, magnitude of impact)

Phase 3: Culture Shift (Weeks 9-12+)

  • Weekly decision meetings based on experiment dashboards (not opinions)
  • Celebrate failed hypotheses ("We thought this would work, we tested it, we learned it doesn't—great intel")
  • Build team's experiment intuition through repeated cycles

Success Metric for Your Team

You know this is working when:

  • Stakeholder disagreements resolve in "let's test it" instead of escalations
  • Junior PMs cite experiment data, not opinions, in product discussions
  • Your shipped features have 2-3 weeks of validation data behind them, not guesses

The Compounding Effect

One quarter of disciplined experimentation (12-13 tests) builds a product advantage that takes competitors months to replicate. After one year, you've learned more than teams that spend a year planning and three months executing.

The math:

  • Traditional approach: 4 months strategy → 2 months planning → 6 months building = 12 months to learn if it worked
  • Hypothesis-driven approach: Weeks 1-4 (test), 5-8 (test), 9-12 (test), 13-16 (test)... = 13 learning cycles per year

The team that compounds learning fastest becomes the team that owns their category.