Claude-skill-registry bdd-hypothesis-validation

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bdd-hypothesis-validation" ~/.claude/skills/majiayu000-claude-skill-registry-bdd-hypothesis-validation && rm -rf "$T"

manifest: skills/data/bdd-hypothesis-validation/SKILL.md

BDD Hypothesis Validation Patterns

Philosophy: Hypotheses Are Executable Experiments

In TMNL, hypotheses are BDD specifications for architectural experiments. They are NOT:

Wishful thinking ("I hope this works")
Implementation plans ("We'll use FlexSearch")
Vague goals ("Make search faster")

Hypotheses ARE:

Falsifiable claims - "Search results flow to AG-Grid rowData with length > 0"
Measurable outcomes - "Progressive streaming produces > 1 update, not a single batch"
Evidence-backed - "Switched to linear driver, got 50 results in 12.3ms"
Living documentation - Updated in real-time as code executes

Canonical principle: A hypothesis without concrete acceptance criteria is not a hypothesis. It's a hope.

Canonical Sources

Primary References

TMNL Testbed Patterns:

src/components/testbed/DataManagerTestbed.tsx

- H1-H5 hypothesis validation

src/components/testbed/shared/hypothesis.tsx

- Validation UI components

src/components/testbed/EffectAtomTestbed.tsx

- Damage report pattern

```
.edin/EFFECT_PATTERNS.md
```
- Effect-atom state patterns

EDIN Methodology:

```
CLAUDE.md
```
— EDIN phases (Experiment, Design, Implement, Negotiate)
```
.agents/index.md
```
— Session journal with hypothesis tracking

Effect Testing:

src/lib/streams/__tests__/ChannelService.test.ts

- Effect.gen assertions

src/lib/ams/v2/__tests__/integration.test.ts

- Multi-hypothesis integration

Pattern 1: Hypothesis Declaration

Hypothesis Structure

A hypothesis in TMNL has:

ID - Unique identifier (H1, H2, H3...) for tracking
Label - Short, memorable name (e.g., "Search → Grid Flow")
Description - Concrete, falsifiable claim
Status -
```
pending
```
,
```
testing
```
,
```
passed
```
,
```
failed
```
Evidence - Concrete metrics that prove/disprove the claim

Example: DataManager Hypotheses (EPOCH-0002)

// src/components/testbed/DataManagerTestbed.tsx

interface Hypothesis {
  id: string
  label: string
  description: string
  status: 'pending' | 'testing' | 'passed' | 'failed'
  evidence?: string
}

const HYPOTHESES: Hypothesis[] = [
  {
    id: 'H1',
    label: 'Search → Grid Flow',
    description: 'Search results flow correctly to AG-Grid rowData',
    status: 'pending',
  },
  {
    id: 'H2',
    label: 'Progressive Streaming',
    description: 'Stream emits results progressively (not all at once)',
    status: 'pending',
  },
  {
    id: 'H3',
    label: 'Stream-First DX',
    description: 'Effect Stream + Fiber cancellation works cleanly',
    status: 'pending',
  },
  {
    id: 'H4',
    label: 'Real-time Metrics',
    description: 'Throughput (items/sec) calculated in real-time',
    status: 'pending',
  },
  {
    id: 'H5',
    label: 'Driver Switching',
    description: 'Switching between flex/linear drivers is seamless',
    status: 'pending',
  },
]

Pattern 2: Hypothesis Validation (Acceptance Criteria)

H1: Search → Grid Flow

Hypothesis:

Search results flow correctly to AG-Grid rowData.

Acceptance Criteria:

Search returns
```
SearchResult<MovieItem>[]
```
with length > 0
```
gridData
```
receives transformed results with length > 0
Grid renders rows matching
```
gridData.length
```

Implementation:

// DataManagerTestbed.tsx

// 1. Search execution
const handleSearch = useCallback(() => {
  const searchStream = currentDriver
    .search(query.trim(), { limit: 100, chunkSize: 5 })
    .pipe(withMinScore<MovieItem, unknown>(0.1))

  const program = searchStream.pipe(
    Stream.runForEach((result) =>
      Effect.sync(() => {
        itemCount++
        setResults((prev) => [...prev, result]) // Accumulate results
      })
    )
  )

  fiberRef.current = Effect.runFork(program)
}, [query, currentDriver])

// 2. Transform results → gridData
const gridData: DataGridRow[] = results.slice(0, 100).map((r) => ({
  id: r.item.id,
  name: r.item.title,
  value: Math.round(r.score * 100),
  status: r.score > 0.8 ? 'active' : 'pending',
}))

// 3. Validate hypothesis when data flows
useEffect(() => {
  // H1 passes when: we have results AND they flowed to gridData
  if (
    results.length > 0 &&
    gridData.length > 0 &&
    hypotheses.find((h) => h.id === 'H1')?.status === 'testing'
  ) {
    updateHypothesis('H1', {
      status: 'passed',
      evidence: `${results.length} results → ${gridData.length} grid rows. Search → Grid flow verified.`,
    })
  }
}, [results, gridData, hypotheses, updateHypothesis])

CRITICAL: Verify Outcomes, Not Function Calls

// ❌ BAD - Tracks function call, not outcome
useEffect(() => {
  if (gridData) {  // gridData exists (even if empty [])
    updateHypothesis('H1', 'passed')  // FALSE POSITIVE
  }
}, [gridData])

// ✅ GOOD - Verifies actual outcome
useEffect(() => {
  if (gridData.length > 0) {  // Actually has results
    updateHypothesis('H1', 'passed', `${gridData.length} rows in grid`)
  }
}, [gridData])

Pattern 3: Progressive Validation (H2)

H2: Progressive Streaming

Hypothesis:

Stream emits results progressively (not all at once).

Acceptance Criteria:

Search stream chunks results (chunkSize=5)
```
progressiveUpdateCount
```
increments multiple times (> 1)
Each update appends to results, not replaces

Implementation:

const handleSearch = useCallback(() => {
  let chunkCount = 0
  let itemCount = 0
  let progressiveUpdateCount = 0

  const searchStream = currentDriver
    .search(query.trim(), { limit: 100, chunkSize: 5 })
    .pipe(withMinScore<MovieItem, unknown>(0.1))

  const program = searchStream.pipe(
    Stream.tap(() =>
      Effect.sync(() => {
        chunkCount++
      })
    ),
    Stream.runForEach((result) =>
      Effect.sync(() => {
        itemCount++
        setResults((prev) => [...prev, result]) // ← Append, not replace

        // Track progressive updates
        progressiveUpdateCount++
      })
    ),
    Effect.ensuring(
      Effect.sync(() => {
        // H2: Progressive streaming verified if > 1 update
        if (itemCount > 0 && progressiveUpdateCount > 1) {
          updateHypothesis('H2', {
            status: 'passed',
            evidence: `${itemCount} results in ${progressiveUpdateCount} progressive updates`,
          })
        } else if (itemCount > 0) {
          updateHypothesis('H2', {
            status: 'failed',
            evidence: `${itemCount} results in single batch (not progressive)`,
          })
        }
      })
    )
  )

  fiberRef.current = Effect.runFork(program)
}, [query, currentDriver])

Pattern: Use

Stream.tap()

to track chunks,

Effect.ensuring()

to validate after stream completes.

Pattern 4: Real-Time Metrics (H4)

H4: Real-time Metrics

Hypothesis:

Throughput (items/sec) is calculated in real-time.

Acceptance Criteria:

```
stats.ms
```
reflects elapsed time
```
throughput = (items / ms) * 1000
```
is > 0 for successful searches
Metrics update during stream execution (not just at end)

Implementation:

const handleSearch = useCallback(() => {
  const startTime = performance.now()
  let itemCount = 0

  const program = searchStream.pipe(
    Stream.runForEach((result) =>
      Effect.sync(() => {
        itemCount++
        setResults((prev) => [...prev, result])

        // Update metrics DURING stream (real-time)
        const elapsed = performance.now() - startTime
        setStats({
          chunks: chunkCount,
          items: itemCount,
          ms: Math.round(elapsed),
        })
      })
    ),
    Effect.ensuring(
      Effect.sync(() => {
        const finalMs = performance.now() - startTime

        // H4: Throughput verification
        const currentThroughput = itemCount > 0 && finalMs > 0
          ? (itemCount / finalMs) * 1000
          : 0

        if (currentThroughput > 0) {
          updateHypothesis('H4', {
            status: 'passed',
            evidence: `${currentThroughput.toFixed(0)} items/sec (${itemCount} items in ${finalMs.toFixed(1)}ms)`,
          })
        } else {
          updateHypothesis('H4', {
            status: 'failed',
            evidence: `No throughput measurable`,
          })
        }
      })
    )
  )

  fiberRef.current = Effect.runFork(program)
}, [query, currentDriver])

Pattern: Calculate metrics DURING stream (not just

Effect.ensuring

), validate concrete values (not just presence).

Pattern 5: Driver Switching (H5)

H5: Driver Switching

Hypothesis:

Switching between flex/linear drivers is seamless.

Acceptance Criteria:

```
setActiveDriver()
```
toggles between drivers
Search with new driver returns results (length > 0)
No errors during switch or subsequent search

Implementation:

const handleDriverSwitch = useCallback(() => {
  const newDriverType = activeDriver === 'flex' ? 'linear' : 'flex'
  const newDriverInstance = newDriverType === 'flex' ? flexDriver : linearDriver

  console.log('[Testbed] Switching to:', newDriverType)
  updateHypothesis('H5', { status: 'testing', evidence: `Switching to ${newDriverType}...` })

  if (!newDriverInstance) {
    updateHypothesis('H5', { status: 'failed', evidence: `${newDriverType} driver not available` })
    return
  }

  setActiveDriver(newDriverType)

  // Re-run search with new driver to verify
  if (query.trim()) {
    setResults([])
    setStatus('streaming')

    let itemCount = 0
    const startTime = performance.now()

    const searchStream = newDriverInstance
      .search(query.trim(), { limit: 50, chunkSize: 5 })
      .pipe(withMinScore<MovieItem, unknown>(0.1))

    const program = searchStream.pipe(
      Stream.runForEach((result) =>
        Effect.sync(() => {
          itemCount++
          setResults((prev) => [...prev, result])
        })
      ),
      Effect.ensuring(
        Effect.sync(() => {
          const ms = performance.now() - startTime
          setStatus('complete')

          // H5 passes if new driver returns results
          if (itemCount > 0) {
            updateHypothesis('H5', {
              status: 'passed',
              evidence: `Switched to ${newDriverType}, got ${itemCount} results in ${ms.toFixed(1)}ms`,
            })
          } else {
            updateHypothesis('H5', {
              status: 'failed',
              evidence: `Switched to ${newDriverType}, but search returned 0 results`,
            })
          }
        })
      )
    )

    Effect.runFork(program)
  } else {
    updateHypothesis('H5', {
      status: 'passed',
      evidence: `Switched to ${newDriverType}. Enter a query to verify search.`,
    })
  }
}, [activeDriver, flexDriver, linearDriver, query, updateHypothesis])

Pattern: Verify switch by executing actual operation (search), not just checking state.

Pattern 6: Damage Reports (Antipattern Discovery)

Purpose of Damage Reports

When hypotheses FAIL, document the antipattern discovered and the fix applied. This becomes living documentation.

Example: DataManager Antipatterns (EPOCH-0002)

// DataManagerTestbed.tsx

interface AntipatternEntry {
  id: string
  title: string
  severity: 'critical' | 'warning' | 'info'
  status: 'fixed' | 'active' | 'mitigated'
  problem: string
  codeExample?: { bad: string; good: string }
  fix: string
}

const DAMAGE_REPORT: AntipatternEntry[] = [
  {
    id: 'DM-001',
    title: 'Atom.runtime(Layer) + Stateful Services',
    severity: 'critical',
    status: 'fixed',
    problem: 'Atom.runtime(Layer) with services using Effect.Ref creates fresh state per operation. doIndex() populates Kernel#1, doSearch() creates Kernel#2 with empty state.',
    codeExample: {
      bad: `// ANTIPATTERN: Layer-per-operation isolation
const runtimeAtom = Atom.runtime(SearchKernel.Default)
const searchOps = {
  search: runtimeAtom.fn<Query>()((query, ctx) =>
    Effect.gen(function*() {
      const kernel = yield* SearchKernel // ← Fresh instance!
      return yield* kernel.search(query) // ← Empty driver!
    })
  )
}`,
      good: `// FIXED: Direct driver pattern with useState
const [driver, setDriver] = useState<SearchServiceImpl | null>(null)

useEffect(() => {
  const init = async () => {
    const flex = await Effect.runPromise(createFlexSearchDriver())
    await Effect.runPromise(flex.index(items, config))
    setDriver(flex) // ← Persists across operations
  }
  init()
}, [])`
    },
    fix: 'Bypass atom layer. Store driver instances in React useState. Use Effect.runPromise for initialization, Effect.runFork for streaming operations.',
  },
  {
    id: 'DM-002',
    title: 'Hypothesis Tracking: Function Call vs Outcome',
    severity: 'warning',
    status: 'fixed',
    problem: 'All 5 hypotheses marked "passed" despite grid showing 0 rows. Hypotheses tracked "function was called" (e.g., setGridData invoked) not "outcome achieved" (e.g., gridData.length > 0).',
    codeExample: {
      bad: `// ANTIPATTERN: Track function call, not outcome
useEffect(() => {
  if (gridData) {  // ← gridData exists (even if empty [])
    updateHypothesis('H1', 'passed')  // ← FALSE POSITIVE
  }
}, [gridData])`,
      good: `// FIXED: Verify actual outcome
useEffect(() => {
  if (gridData.length > 0) {  // ← Actually has results
    updateHypothesis('H1', 'passed', \`\${gridData.length} rows in grid\`)
  }
}, [gridData, updateHypothesis])`
    },
    fix: 'Track actual outcomes: gridData.length > 0, progressiveUpdateCount > 1, throughput > 0 && stats.items > 0, etc.',
  },
]

Damage Report UI Component

// src/components/testbed/shared/hypothesis.tsx

export interface DamageReportFinding {
  id: string
  severity: 'critical' | 'warning' | 'info' | 'resolved'
  title: string
  description?: string
  hypothesis?: string
}

export function DamageReport({ findings }: { findings: DamageReportFinding[] }) {
  const activeFindingsCount = findings.filter((f) => f.severity !== 'resolved').length

  return (
    <div className="border border-neutral-800 rounded-lg">
      <div className="px-4 py-3 bg-neutral-900/50 flex items-center justify-between">
        <span className="font-mono text-neutral-400 uppercase">DAMAGE REPORT</span>
        <span className={activeFindingsCount > 0 ? 'text-amber-400' : 'text-green-400'}>
          {activeFindingsCount > 0 ? `${activeFindingsCount} ACTIVE` : 'ALL CLEAR'}
        </span>
      </div>

      <div className="divide-y divide-neutral-800">
        {findings.map((finding) => (
          <div key={finding.id} className="p-3">
            <div className="flex items-center gap-2">
              <span className="font-mono">{finding.title}</span>
              {finding.hypothesis && (
                <span className="font-mono text-neutral-500">[{finding.hypothesis}]</span>
              )}
            </div>
            {finding.description && (
              <p className="text-neutral-400 mt-1">{finding.description}</p>
            )}
          </div>
        ))}
      </div>
    </div>
  )
}

Pattern 7: Hypothesis UI Components

HypothesisBadge

// src/components/testbed/shared/hypothesis.tsx

export type ValidationStatus = 'validated' | 'failed' | 'pending' | 'partial' | 'unknown'

export function HypothesisBadge({ id, status }: { id: string; status: ValidationStatus }) {
  const statusConfig = {
    validated: { bg: 'bg-green-900/30', text: 'text-green-400', icon: CheckCircle },
    failed: { bg: 'bg-red-900/30', text: 'text-red-400', icon: XCircle },
    pending: { bg: 'bg-amber-900/30', text: 'text-amber-400', icon: Clock },
    partial: { bg: 'bg-cyan-900/30', text: 'text-cyan-400', icon: AlertTriangle },
    unknown: { bg: 'bg-neutral-800/50', text: 'text-neutral-400', icon: Info },
  }

  const config = statusConfig[status]
  const Icon = config.icon

  return (
    <span className={`inline-flex items-center gap-1 px-2 py-0.5 ${config.bg} ${config.text}`}>
      <Icon size={10} />
      {id}
    </span>
  )
}

HypothesisSection

export function HypothesisSection({
  id,
  title,
  description,
  status,
  children,
}: {
  id: string
  title: string
  description: string
  status: ValidationStatus
  children: ReactNode
}) {
  const [isExpanded, setIsExpanded] = useState(true)

  return (
    <section className="border border-neutral-800 rounded-lg">
      <button
        onClick={() => setIsExpanded(!isExpanded)}
        className="w-full p-4 bg-neutral-900/50 flex items-center justify-between"
      >
        <div className="flex items-center gap-3">
          <HypothesisBadge id={id} status={status} compact />
          <div className="text-left">
            <h3 className="font-mono text-neutral-200">{title}</h3>
            <p className="text-neutral-500">{description}</p>
          </div>
        </div>
        <span className="text-neutral-500">{isExpanded ? '−' : '+'}</span>
      </button>
      {isExpanded && <div className="p-4 border-t border-neutral-800">{children}</div>}
    </section>
  )
}

HypothesisCard (for Grid Display)

function HypothesisCard({ hypothesis }: { hypothesis: Hypothesis }) {
  const statusColors = {
    pending: 'border-neutral-700 text-neutral-600',
    testing: 'border-amber-500/50 text-amber-500 animate-pulse',
    passed: 'border-emerald-500/50 text-emerald-500',
    failed: 'border-red-500/50 text-red-500',
  }

  const statusIcons = {
    pending: '○',
    testing: '◐',
    passed: '●',
    failed: '✕',
  }

  return (
    <div className={`border p-3 ${statusColors[hypothesis.status]}`}>
      <div className="flex items-start gap-2">
        <span className="font-mono">{statusIcons[hypothesis.status]}</span>
        <div className="flex-1">
          <div className="font-mono uppercase">{hypothesis.id}: {hypothesis.label}</div>
          <div className="font-mono text-neutral-500">{hypothesis.description}</div>
          {hypothesis.evidence && (
            <div className="font-mono text-neutral-600 mt-2 border-t pt-2">
              Evidence: {hypothesis.evidence}
            </div>
          )}
        </div>
      </div>
    </div>
  )
}

Pattern 8: EDIN Integration (Experiment Phase)

EDIN Phases and Hypothesis Flow

EDIN Cycle:

Experiment - Define hypotheses, validate assumptions
Design - Convert proven hypotheses into Operations
Implement - Execute Operations with bounded Tasks
Negotiate - Debrief, update Proposals, reallocate

Hypothesis in Experiment Phase:

// .edin/experiments/EPOCH-0002-data-manager.md

## Experiment: DataManager Effect-Atom Integration

### Hypotheses

**H1: Search → Grid Flow**
- Claim: Search results flow correctly to AG-Grid rowData
- Acceptance: gridData.length > 0 after search completes
- Status: PASSED
- Evidence: 50 results → 50 grid rows in 12.3ms

**H2: Progressive Streaming**
- Claim: Stream emits results progressively (not all at once)
- Acceptance: progressiveUpdateCount > 1
- Status: PASSED
- Evidence: 50 results in 10 progressive updates

**H3: Stream-First DX**
- Claim: Effect Stream + Fiber cancellation works cleanly
- Acceptance: No errors, clean cancellation on unmount
- Status: PASSED
- Evidence: 5 searches with cancellations, 0 errors

**H4: Real-time Metrics**
- Claim: Throughput (items/sec) calculated in real-time
- Acceptance: stats.ms and throughput update during stream
- Status: PASSED
- Evidence: 4065 items/sec (50 items in 12.3ms)

**H5: Driver Switching**
- Claim: Switching between flex/linear drivers is seamless
- Acceptance: Search after switch returns results
- Status: PASSED
- Evidence: Switched to linear, got 50 results in 8.9ms

### Damage Report

**DM-001: Atom.runtime + Stateful Services (CRITICAL, FIXED)**
- Problem: Effect.Ref creates fresh state per operation
- Fix: Direct driver pattern with useState

**DM-002: Hypothesis Tracking (WARNING, FIXED)**
- Problem: Tracked function calls, not outcomes
- Fix: Verify gridData.length > 0, not just gridData existence

### Negotiation

All hypotheses PASSED. Transition to Design phase:
- Define Operations for DataManager integration
- Extract reusable patterns to .edin/EFFECT_PATTERNS.md
- Update Proposals queue with DataManager v2 features

Pattern 9: Multi-Hypothesis Testbeds

Organizing Multiple Hypotheses

// Testbed with 5 hypotheses
export function DataManagerTestbed() {
  const [hypotheses, setHypotheses] = useState(HYPOTHESES)

  const updateHypothesis = useCallback((id: string, updates: Partial<Hypothesis>) => {
    setHypotheses((prev) =>
      prev.map((h) => (h.id === id ? { ...h, ...updates } : h))
    )
  }, [])

  return (
    <div>
      {/* Hypothesis Grid */}
      <section>
        <SectionLabel>Experiment Hypotheses</SectionLabel>
        <div className="grid grid-cols-5 gap-3">
          {hypotheses.map((h) => (
            <HypothesisCard key={h.id} hypothesis={h} />
          ))}
        </div>
      </section>

      {/* Search Interface (triggers H1, H2, H4) */}
      <section>
        <SectionLabel>Search Interface</SectionLabel>
        <input value={query} onChange={(e) => setQuery(e.target.value)} />
        <button onClick={handleSearch}>Search</button>
      </section>

      {/* Driver Toggle (triggers H5) */}
      <section>
        <button onClick={handleDriverSwitch}>
          Switch to {activeDriver === 'flex' ? 'linear' : 'flex'}
        </button>
      </section>

      {/* Results Grid (validates H1) */}
      <section>
        <DataGrid rowData={gridData} />
      </section>

      {/* Damage Report */}
      <section>
        <SectionLabel>Damage Report</SectionLabel>
        <DamageReportPanel findings={DAMAGE_REPORT} />
      </section>
    </div>
  )
}

Pattern 10: Hypothesis-Driven Test Suites

Mapping Hypotheses to it.effect() Tests

// DataManager.test.ts (if we extract testbed to unit tests)

describe("DataManager Hypotheses (EPOCH-0002)", () => {
  describe("H1: Search → Grid Flow", () => {
    it.effect("search results flow to grid data transformation", () =>
      Effect.gen(function* () {
        const driver = yield* createFlexSearchDriver<MovieItem>()
        const movies = processMovies(100)
        yield* driver.index(movies, { fields: ['title'] })

        const results = yield* driver.search("matrix", { limit: 10 })
          .pipe(Stream.runCollect)

        const gridData = Chunk.toArray(results).map((r) => ({
          id: r.item.id,
          name: r.item.title,
          value: Math.round(r.score * 100),
        }))

        // H1 acceptance criteria
        expect(Chunk.size(results)).toBeGreaterThan(0)
        expect(gridData.length).toBeGreaterThan(0)
        expect(gridData.length).toBe(Chunk.size(results))
      })
    )
  })

  describe("H2: Progressive Streaming", () => {
    it.effect("stream emits multiple chunks", () =>
      Effect.gen(function* () {
        const driver = yield* createFlexSearchDriver<MovieItem>()
        const movies = processMovies(100)
        yield* driver.index(movies, { fields: ['title'] })

        let chunkCount = 0
        let itemCount = 0

        yield* driver.search("matrix", { limit: 20, chunkSize: 5 })
          .pipe(
            Stream.tap(() => Effect.sync(() => chunkCount++)),
            Stream.runForEach(() => Effect.sync(() => itemCount++))
          )

        // H2 acceptance criteria
        expect(chunkCount).toBeGreaterThan(1)
        expect(itemCount).toBeGreaterThan(0)
      })
    )
  })
})

Anti-Patterns (Avoid These)

❌ Vague Hypotheses

// BAD - Not falsifiable
{
  id: 'H1',
  label: 'Search works',
  description: 'Search should work correctly',
  status: 'pending',
}

// GOOD - Concrete, measurable
{
  id: 'H1',
  label: 'Search → Grid Flow',
  description: 'Search results flow correctly to AG-Grid rowData with length > 0',
  status: 'pending',
}

❌ Tracking Function Calls Instead of Outcomes

// BAD - False positive
useEffect(() => {
  if (gridData) {  // Empty [] is truthy!
    updateHypothesis('H1', 'passed')
  }
}, [gridData])

// GOOD - Verify actual outcome
useEffect(() => {
  if (gridData.length > 0) {
    updateHypothesis('H1', 'passed', `${gridData.length} rows`)
  }
}, [gridData])

❌ Missing Evidence

// BAD - No concrete metrics
updateHypothesis('H4', { status: 'passed' })

// GOOD - Include evidence
updateHypothesis('H4', {
  status: 'passed',
  evidence: `4065 items/sec (50 items in 12.3ms)`,
})

❌ Hypothesis Without Acceptance Criteria

// BAD - How do we know when it passes?
{
  id: 'H3',
  label: 'Effect Integration',
  description: 'Effect works with atoms',
  status: 'pending',
}

// GOOD - Clear acceptance criteria
{
  id: 'H3',
  label: 'Stream-First DX',
  description: 'Effect Stream + Fiber cancellation works cleanly (0 errors, clean unmount)',
  status: 'pending',
}

Summary: Hypothesis Validation Discipline

Core Principles

Hypotheses are falsifiable claims - Not wishes, concrete outcomes
Status transitions require evidence - "passed" needs metrics, not just code execution
Damage reports document antipatterns - Failed hypotheses become learning
UI components visualize validation - HypothesisBadge, HypothesisCard, DamageReport
EDIN integration - Hypotheses in Experiment phase, Operations in Design

TMNL-Specific Patterns

H1-H5 naming - Sequential IDs for tracking across testbeds
Evidence strings - Concrete metrics in updateHypothesis()
Damage report severity - critical/warning/info for antipattern classification
Testbed UI - Grid layout for hypotheses, collapsible damage reports

Canonical Testbeds

src/components/testbed/DataManagerTestbed.tsx

- 5-hypothesis validation

src/components/testbed/EffectAtomTestbed.tsx

- Damage report pattern

src/components/testbed/shared/hypothesis.tsx

- UI components

When in doubt: If you cannot write concrete acceptance criteria with measurable outcomes, the hypothesis is not ready. Refine it until you can specify exactly what "passed" means.