Claude-skill-registry bdd-hypothesis-validation
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bdd-hypothesis-validation" ~/.claude/skills/majiayu000-claude-skill-registry-bdd-hypothesis-validation && rm -rf "$T"
skills/data/bdd-hypothesis-validation/SKILL.mdBDD Hypothesis Validation Patterns
Philosophy: Hypotheses Are Executable Experiments
In TMNL, hypotheses are BDD specifications for architectural experiments. They are NOT:
- Wishful thinking ("I hope this works")
- Implementation plans ("We'll use FlexSearch")
- Vague goals ("Make search faster")
Hypotheses ARE:
- Falsifiable claims - "Search results flow to AG-Grid rowData with length > 0"
- Measurable outcomes - "Progressive streaming produces > 1 update, not a single batch"
- Evidence-backed - "Switched to linear driver, got 50 results in 12.3ms"
- Living documentation - Updated in real-time as code executes
Canonical principle: A hypothesis without concrete acceptance criteria is not a hypothesis. It's a hope.
Canonical Sources
Primary References
TMNL Testbed Patterns:
- H1-H5 hypothesis validationsrc/components/testbed/DataManagerTestbed.tsx
- Validation UI componentssrc/components/testbed/shared/hypothesis.tsx
- Damage report patternsrc/components/testbed/EffectAtomTestbed.tsx
- Effect-atom state patterns.edin/EFFECT_PATTERNS.md
EDIN Methodology:
— EDIN phases (Experiment, Design, Implement, Negotiate)CLAUDE.md
— Session journal with hypothesis tracking.agents/index.md
Effect Testing:
- Effect.gen assertionssrc/lib/streams/__tests__/ChannelService.test.ts
- Multi-hypothesis integrationsrc/lib/ams/v2/__tests__/integration.test.ts
Pattern 1: Hypothesis Declaration
Hypothesis Structure
A hypothesis in TMNL has:
- ID - Unique identifier (H1, H2, H3...) for tracking
- Label - Short, memorable name (e.g., "Search → Grid Flow")
- Description - Concrete, falsifiable claim
- Status -
,pending
,testing
,passedfailed - Evidence - Concrete metrics that prove/disprove the claim
Example: DataManager Hypotheses (EPOCH-0002)
// src/components/testbed/DataManagerTestbed.tsx interface Hypothesis { id: string label: string description: string status: 'pending' | 'testing' | 'passed' | 'failed' evidence?: string } const HYPOTHESES: Hypothesis[] = [ { id: 'H1', label: 'Search → Grid Flow', description: 'Search results flow correctly to AG-Grid rowData', status: 'pending', }, { id: 'H2', label: 'Progressive Streaming', description: 'Stream emits results progressively (not all at once)', status: 'pending', }, { id: 'H3', label: 'Stream-First DX', description: 'Effect Stream + Fiber cancellation works cleanly', status: 'pending', }, { id: 'H4', label: 'Real-time Metrics', description: 'Throughput (items/sec) calculated in real-time', status: 'pending', }, { id: 'H5', label: 'Driver Switching', description: 'Switching between flex/linear drivers is seamless', status: 'pending', }, ]
Pattern 2: Hypothesis Validation (Acceptance Criteria)
H1: Search → Grid Flow
Hypothesis:
Search results flow correctly to AG-Grid rowData.
Acceptance Criteria:
- Search returns
with length > 0SearchResult<MovieItem>[]
receives transformed results with length > 0gridData- Grid renders rows matching
gridData.length
Implementation:
// DataManagerTestbed.tsx // 1. Search execution const handleSearch = useCallback(() => { const searchStream = currentDriver .search(query.trim(), { limit: 100, chunkSize: 5 }) .pipe(withMinScore<MovieItem, unknown>(0.1)) const program = searchStream.pipe( Stream.runForEach((result) => Effect.sync(() => { itemCount++ setResults((prev) => [...prev, result]) // Accumulate results }) ) ) fiberRef.current = Effect.runFork(program) }, [query, currentDriver]) // 2. Transform results → gridData const gridData: DataGridRow[] = results.slice(0, 100).map((r) => ({ id: r.item.id, name: r.item.title, value: Math.round(r.score * 100), status: r.score > 0.8 ? 'active' : 'pending', })) // 3. Validate hypothesis when data flows useEffect(() => { // H1 passes when: we have results AND they flowed to gridData if ( results.length > 0 && gridData.length > 0 && hypotheses.find((h) => h.id === 'H1')?.status === 'testing' ) { updateHypothesis('H1', { status: 'passed', evidence: `${results.length} results → ${gridData.length} grid rows. Search → Grid flow verified.`, }) } }, [results, gridData, hypotheses, updateHypothesis])
CRITICAL: Verify Outcomes, Not Function Calls
// ❌ BAD - Tracks function call, not outcome useEffect(() => { if (gridData) { // gridData exists (even if empty []) updateHypothesis('H1', 'passed') // FALSE POSITIVE } }, [gridData]) // ✅ GOOD - Verifies actual outcome useEffect(() => { if (gridData.length > 0) { // Actually has results updateHypothesis('H1', 'passed', `${gridData.length} rows in grid`) } }, [gridData])
Pattern 3: Progressive Validation (H2)
H2: Progressive Streaming
Hypothesis:
Stream emits results progressively (not all at once).
Acceptance Criteria:
- Search stream chunks results (chunkSize=5)
increments multiple times (> 1)progressiveUpdateCount- Each update appends to results, not replaces
Implementation:
const handleSearch = useCallback(() => { let chunkCount = 0 let itemCount = 0 let progressiveUpdateCount = 0 const searchStream = currentDriver .search(query.trim(), { limit: 100, chunkSize: 5 }) .pipe(withMinScore<MovieItem, unknown>(0.1)) const program = searchStream.pipe( Stream.tap(() => Effect.sync(() => { chunkCount++ }) ), Stream.runForEach((result) => Effect.sync(() => { itemCount++ setResults((prev) => [...prev, result]) // ← Append, not replace // Track progressive updates progressiveUpdateCount++ }) ), Effect.ensuring( Effect.sync(() => { // H2: Progressive streaming verified if > 1 update if (itemCount > 0 && progressiveUpdateCount > 1) { updateHypothesis('H2', { status: 'passed', evidence: `${itemCount} results in ${progressiveUpdateCount} progressive updates`, }) } else if (itemCount > 0) { updateHypothesis('H2', { status: 'failed', evidence: `${itemCount} results in single batch (not progressive)`, }) } }) ) ) fiberRef.current = Effect.runFork(program) }, [query, currentDriver])
Pattern: Use
Stream.tap() to track chunks, Effect.ensuring() to validate after stream completes.
Pattern 4: Real-Time Metrics (H4)
H4: Real-time Metrics
Hypothesis:
Throughput (items/sec) is calculated in real-time.
Acceptance Criteria:
reflects elapsed timestats.ms
is > 0 for successful searchesthroughput = (items / ms) * 1000- Metrics update during stream execution (not just at end)
Implementation:
const handleSearch = useCallback(() => { const startTime = performance.now() let itemCount = 0 const program = searchStream.pipe( Stream.runForEach((result) => Effect.sync(() => { itemCount++ setResults((prev) => [...prev, result]) // Update metrics DURING stream (real-time) const elapsed = performance.now() - startTime setStats({ chunks: chunkCount, items: itemCount, ms: Math.round(elapsed), }) }) ), Effect.ensuring( Effect.sync(() => { const finalMs = performance.now() - startTime // H4: Throughput verification const currentThroughput = itemCount > 0 && finalMs > 0 ? (itemCount / finalMs) * 1000 : 0 if (currentThroughput > 0) { updateHypothesis('H4', { status: 'passed', evidence: `${currentThroughput.toFixed(0)} items/sec (${itemCount} items in ${finalMs.toFixed(1)}ms)`, }) } else { updateHypothesis('H4', { status: 'failed', evidence: `No throughput measurable`, }) } }) ) ) fiberRef.current = Effect.runFork(program) }, [query, currentDriver])
Pattern: Calculate metrics DURING stream (not just
Effect.ensuring), validate concrete values (not just presence).
Pattern 5: Driver Switching (H5)
H5: Driver Switching
Hypothesis:
Switching between flex/linear drivers is seamless.
Acceptance Criteria:
toggles between driverssetActiveDriver()- Search with new driver returns results (length > 0)
- No errors during switch or subsequent search
Implementation:
const handleDriverSwitch = useCallback(() => { const newDriverType = activeDriver === 'flex' ? 'linear' : 'flex' const newDriverInstance = newDriverType === 'flex' ? flexDriver : linearDriver console.log('[Testbed] Switching to:', newDriverType) updateHypothesis('H5', { status: 'testing', evidence: `Switching to ${newDriverType}...` }) if (!newDriverInstance) { updateHypothesis('H5', { status: 'failed', evidence: `${newDriverType} driver not available` }) return } setActiveDriver(newDriverType) // Re-run search with new driver to verify if (query.trim()) { setResults([]) setStatus('streaming') let itemCount = 0 const startTime = performance.now() const searchStream = newDriverInstance .search(query.trim(), { limit: 50, chunkSize: 5 }) .pipe(withMinScore<MovieItem, unknown>(0.1)) const program = searchStream.pipe( Stream.runForEach((result) => Effect.sync(() => { itemCount++ setResults((prev) => [...prev, result]) }) ), Effect.ensuring( Effect.sync(() => { const ms = performance.now() - startTime setStatus('complete') // H5 passes if new driver returns results if (itemCount > 0) { updateHypothesis('H5', { status: 'passed', evidence: `Switched to ${newDriverType}, got ${itemCount} results in ${ms.toFixed(1)}ms`, }) } else { updateHypothesis('H5', { status: 'failed', evidence: `Switched to ${newDriverType}, but search returned 0 results`, }) } }) ) ) Effect.runFork(program) } else { updateHypothesis('H5', { status: 'passed', evidence: `Switched to ${newDriverType}. Enter a query to verify search.`, }) } }, [activeDriver, flexDriver, linearDriver, query, updateHypothesis])
Pattern: Verify switch by executing actual operation (search), not just checking state.
Pattern 6: Damage Reports (Antipattern Discovery)
Purpose of Damage Reports
When hypotheses FAIL, document the antipattern discovered and the fix applied. This becomes living documentation.
Example: DataManager Antipatterns (EPOCH-0002)
// DataManagerTestbed.tsx interface AntipatternEntry { id: string title: string severity: 'critical' | 'warning' | 'info' status: 'fixed' | 'active' | 'mitigated' problem: string codeExample?: { bad: string; good: string } fix: string } const DAMAGE_REPORT: AntipatternEntry[] = [ { id: 'DM-001', title: 'Atom.runtime(Layer) + Stateful Services', severity: 'critical', status: 'fixed', problem: 'Atom.runtime(Layer) with services using Effect.Ref creates fresh state per operation. doIndex() populates Kernel#1, doSearch() creates Kernel#2 with empty state.', codeExample: { bad: `// ANTIPATTERN: Layer-per-operation isolation const runtimeAtom = Atom.runtime(SearchKernel.Default) const searchOps = { search: runtimeAtom.fn<Query>()((query, ctx) => Effect.gen(function*() { const kernel = yield* SearchKernel // ← Fresh instance! return yield* kernel.search(query) // ← Empty driver! }) ) }`, good: `// FIXED: Direct driver pattern with useState const [driver, setDriver] = useState<SearchServiceImpl | null>(null) useEffect(() => { const init = async () => { const flex = await Effect.runPromise(createFlexSearchDriver()) await Effect.runPromise(flex.index(items, config)) setDriver(flex) // ← Persists across operations } init() }, [])` }, fix: 'Bypass atom layer. Store driver instances in React useState. Use Effect.runPromise for initialization, Effect.runFork for streaming operations.', }, { id: 'DM-002', title: 'Hypothesis Tracking: Function Call vs Outcome', severity: 'warning', status: 'fixed', problem: 'All 5 hypotheses marked "passed" despite grid showing 0 rows. Hypotheses tracked "function was called" (e.g., setGridData invoked) not "outcome achieved" (e.g., gridData.length > 0).', codeExample: { bad: `// ANTIPATTERN: Track function call, not outcome useEffect(() => { if (gridData) { // ← gridData exists (even if empty []) updateHypothesis('H1', 'passed') // ← FALSE POSITIVE } }, [gridData])`, good: `// FIXED: Verify actual outcome useEffect(() => { if (gridData.length > 0) { // ← Actually has results updateHypothesis('H1', 'passed', \`\${gridData.length} rows in grid\`) } }, [gridData, updateHypothesis])` }, fix: 'Track actual outcomes: gridData.length > 0, progressiveUpdateCount > 1, throughput > 0 && stats.items > 0, etc.', }, ]
Damage Report UI Component
// src/components/testbed/shared/hypothesis.tsx export interface DamageReportFinding { id: string severity: 'critical' | 'warning' | 'info' | 'resolved' title: string description?: string hypothesis?: string } export function DamageReport({ findings }: { findings: DamageReportFinding[] }) { const activeFindingsCount = findings.filter((f) => f.severity !== 'resolved').length return ( <div className="border border-neutral-800 rounded-lg"> <div className="px-4 py-3 bg-neutral-900/50 flex items-center justify-between"> <span className="font-mono text-neutral-400 uppercase">DAMAGE REPORT</span> <span className={activeFindingsCount > 0 ? 'text-amber-400' : 'text-green-400'}> {activeFindingsCount > 0 ? `${activeFindingsCount} ACTIVE` : 'ALL CLEAR'} </span> </div> <div className="divide-y divide-neutral-800"> {findings.map((finding) => ( <div key={finding.id} className="p-3"> <div className="flex items-center gap-2"> <span className="font-mono">{finding.title}</span> {finding.hypothesis && ( <span className="font-mono text-neutral-500">[{finding.hypothesis}]</span> )} </div> {finding.description && ( <p className="text-neutral-400 mt-1">{finding.description}</p> )} </div> ))} </div> </div> ) }
Pattern 7: Hypothesis UI Components
HypothesisBadge
// src/components/testbed/shared/hypothesis.tsx export type ValidationStatus = 'validated' | 'failed' | 'pending' | 'partial' | 'unknown' export function HypothesisBadge({ id, status }: { id: string; status: ValidationStatus }) { const statusConfig = { validated: { bg: 'bg-green-900/30', text: 'text-green-400', icon: CheckCircle }, failed: { bg: 'bg-red-900/30', text: 'text-red-400', icon: XCircle }, pending: { bg: 'bg-amber-900/30', text: 'text-amber-400', icon: Clock }, partial: { bg: 'bg-cyan-900/30', text: 'text-cyan-400', icon: AlertTriangle }, unknown: { bg: 'bg-neutral-800/50', text: 'text-neutral-400', icon: Info }, } const config = statusConfig[status] const Icon = config.icon return ( <span className={`inline-flex items-center gap-1 px-2 py-0.5 ${config.bg} ${config.text}`}> <Icon size={10} /> {id} </span> ) }
HypothesisSection
export function HypothesisSection({ id, title, description, status, children, }: { id: string title: string description: string status: ValidationStatus children: ReactNode }) { const [isExpanded, setIsExpanded] = useState(true) return ( <section className="border border-neutral-800 rounded-lg"> <button onClick={() => setIsExpanded(!isExpanded)} className="w-full p-4 bg-neutral-900/50 flex items-center justify-between" > <div className="flex items-center gap-3"> <HypothesisBadge id={id} status={status} compact /> <div className="text-left"> <h3 className="font-mono text-neutral-200">{title}</h3> <p className="text-neutral-500">{description}</p> </div> </div> <span className="text-neutral-500">{isExpanded ? '−' : '+'}</span> </button> {isExpanded && <div className="p-4 border-t border-neutral-800">{children}</div>} </section> ) }
HypothesisCard (for Grid Display)
function HypothesisCard({ hypothesis }: { hypothesis: Hypothesis }) { const statusColors = { pending: 'border-neutral-700 text-neutral-600', testing: 'border-amber-500/50 text-amber-500 animate-pulse', passed: 'border-emerald-500/50 text-emerald-500', failed: 'border-red-500/50 text-red-500', } const statusIcons = { pending: '○', testing: '◐', passed: '●', failed: '✕', } return ( <div className={`border p-3 ${statusColors[hypothesis.status]}`}> <div className="flex items-start gap-2"> <span className="font-mono">{statusIcons[hypothesis.status]}</span> <div className="flex-1"> <div className="font-mono uppercase">{hypothesis.id}: {hypothesis.label}</div> <div className="font-mono text-neutral-500">{hypothesis.description}</div> {hypothesis.evidence && ( <div className="font-mono text-neutral-600 mt-2 border-t pt-2"> Evidence: {hypothesis.evidence} </div> )} </div> </div> </div> ) }
Pattern 8: EDIN Integration (Experiment Phase)
EDIN Phases and Hypothesis Flow
EDIN Cycle:
- Experiment - Define hypotheses, validate assumptions
- Design - Convert proven hypotheses into Operations
- Implement - Execute Operations with bounded Tasks
- Negotiate - Debrief, update Proposals, reallocate
Hypothesis in Experiment Phase:
// .edin/experiments/EPOCH-0002-data-manager.md ## Experiment: DataManager Effect-Atom Integration ### Hypotheses **H1: Search → Grid Flow** - Claim: Search results flow correctly to AG-Grid rowData - Acceptance: gridData.length > 0 after search completes - Status: PASSED - Evidence: 50 results → 50 grid rows in 12.3ms **H2: Progressive Streaming** - Claim: Stream emits results progressively (not all at once) - Acceptance: progressiveUpdateCount > 1 - Status: PASSED - Evidence: 50 results in 10 progressive updates **H3: Stream-First DX** - Claim: Effect Stream + Fiber cancellation works cleanly - Acceptance: No errors, clean cancellation on unmount - Status: PASSED - Evidence: 5 searches with cancellations, 0 errors **H4: Real-time Metrics** - Claim: Throughput (items/sec) calculated in real-time - Acceptance: stats.ms and throughput update during stream - Status: PASSED - Evidence: 4065 items/sec (50 items in 12.3ms) **H5: Driver Switching** - Claim: Switching between flex/linear drivers is seamless - Acceptance: Search after switch returns results - Status: PASSED - Evidence: Switched to linear, got 50 results in 8.9ms ### Damage Report **DM-001: Atom.runtime + Stateful Services (CRITICAL, FIXED)** - Problem: Effect.Ref creates fresh state per operation - Fix: Direct driver pattern with useState **DM-002: Hypothesis Tracking (WARNING, FIXED)** - Problem: Tracked function calls, not outcomes - Fix: Verify gridData.length > 0, not just gridData existence ### Negotiation All hypotheses PASSED. Transition to Design phase: - Define Operations for DataManager integration - Extract reusable patterns to .edin/EFFECT_PATTERNS.md - Update Proposals queue with DataManager v2 features
Pattern 9: Multi-Hypothesis Testbeds
Organizing Multiple Hypotheses
// Testbed with 5 hypotheses export function DataManagerTestbed() { const [hypotheses, setHypotheses] = useState(HYPOTHESES) const updateHypothesis = useCallback((id: string, updates: Partial<Hypothesis>) => { setHypotheses((prev) => prev.map((h) => (h.id === id ? { ...h, ...updates } : h)) ) }, []) return ( <div> {/* Hypothesis Grid */} <section> <SectionLabel>Experiment Hypotheses</SectionLabel> <div className="grid grid-cols-5 gap-3"> {hypotheses.map((h) => ( <HypothesisCard key={h.id} hypothesis={h} /> ))} </div> </section> {/* Search Interface (triggers H1, H2, H4) */} <section> <SectionLabel>Search Interface</SectionLabel> <input value={query} onChange={(e) => setQuery(e.target.value)} /> <button onClick={handleSearch}>Search</button> </section> {/* Driver Toggle (triggers H5) */} <section> <button onClick={handleDriverSwitch}> Switch to {activeDriver === 'flex' ? 'linear' : 'flex'} </button> </section> {/* Results Grid (validates H1) */} <section> <DataGrid rowData={gridData} /> </section> {/* Damage Report */} <section> <SectionLabel>Damage Report</SectionLabel> <DamageReportPanel findings={DAMAGE_REPORT} /> </section> </div> ) }
Pattern 10: Hypothesis-Driven Test Suites
Mapping Hypotheses to it.effect() Tests
// DataManager.test.ts (if we extract testbed to unit tests) describe("DataManager Hypotheses (EPOCH-0002)", () => { describe("H1: Search → Grid Flow", () => { it.effect("search results flow to grid data transformation", () => Effect.gen(function* () { const driver = yield* createFlexSearchDriver<MovieItem>() const movies = processMovies(100) yield* driver.index(movies, { fields: ['title'] }) const results = yield* driver.search("matrix", { limit: 10 }) .pipe(Stream.runCollect) const gridData = Chunk.toArray(results).map((r) => ({ id: r.item.id, name: r.item.title, value: Math.round(r.score * 100), })) // H1 acceptance criteria expect(Chunk.size(results)).toBeGreaterThan(0) expect(gridData.length).toBeGreaterThan(0) expect(gridData.length).toBe(Chunk.size(results)) }) ) }) describe("H2: Progressive Streaming", () => { it.effect("stream emits multiple chunks", () => Effect.gen(function* () { const driver = yield* createFlexSearchDriver<MovieItem>() const movies = processMovies(100) yield* driver.index(movies, { fields: ['title'] }) let chunkCount = 0 let itemCount = 0 yield* driver.search("matrix", { limit: 20, chunkSize: 5 }) .pipe( Stream.tap(() => Effect.sync(() => chunkCount++)), Stream.runForEach(() => Effect.sync(() => itemCount++)) ) // H2 acceptance criteria expect(chunkCount).toBeGreaterThan(1) expect(itemCount).toBeGreaterThan(0) }) ) }) })
Anti-Patterns (Avoid These)
❌ Vague Hypotheses
// BAD - Not falsifiable { id: 'H1', label: 'Search works', description: 'Search should work correctly', status: 'pending', }
// GOOD - Concrete, measurable { id: 'H1', label: 'Search → Grid Flow', description: 'Search results flow correctly to AG-Grid rowData with length > 0', status: 'pending', }
❌ Tracking Function Calls Instead of Outcomes
// BAD - False positive useEffect(() => { if (gridData) { // Empty [] is truthy! updateHypothesis('H1', 'passed') } }, [gridData])
// GOOD - Verify actual outcome useEffect(() => { if (gridData.length > 0) { updateHypothesis('H1', 'passed', `${gridData.length} rows`) } }, [gridData])
❌ Missing Evidence
// BAD - No concrete metrics updateHypothesis('H4', { status: 'passed' })
// GOOD - Include evidence updateHypothesis('H4', { status: 'passed', evidence: `4065 items/sec (50 items in 12.3ms)`, })
❌ Hypothesis Without Acceptance Criteria
// BAD - How do we know when it passes? { id: 'H3', label: 'Effect Integration', description: 'Effect works with atoms', status: 'pending', }
// GOOD - Clear acceptance criteria { id: 'H3', label: 'Stream-First DX', description: 'Effect Stream + Fiber cancellation works cleanly (0 errors, clean unmount)', status: 'pending', }
Summary: Hypothesis Validation Discipline
Core Principles
- Hypotheses are falsifiable claims - Not wishes, concrete outcomes
- Status transitions require evidence - "passed" needs metrics, not just code execution
- Damage reports document antipatterns - Failed hypotheses become learning
- UI components visualize validation - HypothesisBadge, HypothesisCard, DamageReport
- EDIN integration - Hypotheses in Experiment phase, Operations in Design
TMNL-Specific Patterns
- H1-H5 naming - Sequential IDs for tracking across testbeds
- Evidence strings - Concrete metrics in updateHypothesis()
- Damage report severity - critical/warning/info for antipattern classification
- Testbed UI - Grid layout for hypotheses, collapsible damage reports
Canonical Testbeds
- 5-hypothesis validationsrc/components/testbed/DataManagerTestbed.tsx
- Damage report patternsrc/components/testbed/EffectAtomTestbed.tsx
- UI componentssrc/components/testbed/shared/hypothesis.tsx
When in doubt: If you cannot write concrete acceptance criteria with measurable outcomes, the hypothesis is not ready. Refine it until you can specify exactly what "passed" means.