Autorun FixStubbornBugsFastByParallelizing

Systematically eliminate configuration bugs by testing 8-10 fundamentally different solutions in parallel. When root cause is ambiguous, this methodology finds the fix in hours instead of days through comprehensive parallel exploration. Build minimal test harness, evaluate all approaches simultaneously, identify winners, and understand WHY they work. Transforms guesswork into systematic discovery.

install

source · Clone the upstream repo

git clone https://github.com/ahundt/autorun

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ahundt/autorun "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/autorun/skills/parallel-subagent" ~/.claude/skills/ahundt-autorun-fixstubbornbugsfastbyparallelizing && rm -rf "$T"

manifest: plugins/autorun/skills/parallel-subagent/SKILL.md

source content

Fix Stubborn Bugs Fast By Parallelizing

Eliminate ambiguous configuration bugs in hours, not days. Test 8-10 fundamentally different solution approaches simultaneously to discover what works and understand why. Transforms trial-and-error into systematic exploration.

Summary

6-Step Process: Read buggy code → Research 8-10 orthogonal approaches (document sources) → Build minimal harness → Implement all solutions → Test and identify winners → Apply fix with analysis + citations

Best for: Configuration/integration bugs in complex subsystems where root cause is ambiguous.

Time: 1-2 hours to build harness, saves hours/days of serial debugging.

Success factors: Platform compatibility, realistic test data with edge cases, comprehensive logging, matching production context, academic-quality documentation with cited sources.

Research requirement: Document all sources with URLs (GitHub issues, Stack Overflow, API docs, papers). Cite in code comments and README.

Related methodologies:

Binary Search Debugging (bisecting commits)
Fuzzing (input space exploration)
Property-Based Testing (behavior space)
A/B Testing (production comparison)
Ablation Testing (component removal)

This methodology is optimized for configuration/integration debugging where the challenge is finding the right way to wire components together.

When to Use

Good fit:

Subsystem integration bugs
Framework configuration with multiple approaches possible
Ambiguous root cause
Slow iteration in main codebase
Reproducible behavior

Poor fit:

Obvious bugs (typos, logic errors)
Only 1-2 solutions exist
Good error messages
Time-critical hotfixes
Cannot isolate

Quick Example: UI Scrolling Bug

Problem: Table with 14 columns not showing horizontal scrollbar for long text (truncated with "...").

10 Approaches: BASELINE, SET_MAX_WIDTH, EXPAND_TO_INCLUDE_X, VSCROLL_FALSE, NO_CLIP, EXACT_COLUMNS, MIN_SCROLLED_WIDTH, SENSE_HOVER, NESTED_SCROLL, MANUAL_ALLOCATE

Result: Only Test 3 (expand_to_include_x) worked - forces UI to reserve space beyond viewport.

Gotcha: Applied fix → still broken! Logs showed data was empty (std::mem::take() moved it earlier). Fix: Calculate width BEFORE take() using available data. Production: 744 processes, 1609-char commands, 11,189px width (vs test: 50 processes, 150-char, 2000px).

The 6-Step Method

Step 1: Read and Understand

Do:

Locate subsystem exhibiting problem
Document current configuration
Write precise expected vs actual behavior
List all configurable parameters
Review framework documentation (save links to relevant API pages)

Output: Clear understanding of broken state with documented references.

Step 2: Research 8-10 Orthogonal Approaches

Research sources (document all with links/citations):

Framework API documentation (official docs, version-specific)
GitHub issues/bug reports (link to issue #, note workarounds)
Stack Overflow discussions (link to answers, cite voting/accepted status)
Academic papers/technical blogs (full citation: author, title, year, URL)
Related library source code (note file/line, link to specific commit)

Citation standard: Document all sources with URLs and context. Academic integrity: credit ideas, link to original discussions, note if approach is from specific issue/paper.

Vary these dimensions:

API Method (constructor/factory/builder)
Init Order (before/after deps)
Explicit vs Implicit
Delegation (parent/child)
Data Structure (mutable/immutable)
Sync vs Async
Config Location (inline/external)
Nested vs Flat
Direct vs Indirect
Pattern (pull/push)

⚠️ Anti-Pattern: Testing parameter values (100, 200, 300). Do instead: Test mechanisms (no buffering, fixed-size, dynamic, ring, double-buffer, memory-mapped, async, zero-copy, pooled, adaptive).

Output: List of 8-10 distinct approaches with hypotheses.

Step 3: Build Minimal Test Harness

Requirements:

Same framework version (copy from dependencies)
Same data structures (representative sample)
Same execution context (threading, hierarchy, globals)
Minimal dependencies (strip business logic)
Fast build (<30 seconds)
Tabbed UI (macOS event loop limit: one per process)

Platform decision:

UI Approach	Pros	Cons
Tabbed	Single window, easy switch, cross-platform	One at a time
Split view	See 2-4 at once	Limited number
Separate windows	All visible	macOS event loop limit
Separate executables	Guaranteed isolation	More overhead

Skeleton with edge cases (adapt to your language):

TEST_DATA = [
    {"name": "Normal", "value": "x"},        # Happy path
    {"name": "x" * 200, "value": "long"},    # Edge: very long
    {"name": "", "value": ""},               # Edge: empty
    {"name": None, "value": None},           # Edge: null
    {"name": "日本語", "value": "🚀"},        # Edge: unicode
    {"name": "  spaces  ", "value": "\n\t"}, # Edge: whitespace
]

tabs.add_tab("Test 1: Baseline", test_1(TEST_DATA))
tabs.add_tab("Test 2: Factory", test_2(TEST_DATA))
# ... tests 3-10

⚠️ Gotcha 1: Separate windows crash on macOS ("EventLoop can't be recreated"). Use tabs.

⚠️ Gotcha 2: Test data must match production scale. Profile production first (min/max/avg), use realistic sizes (not 10 items vs 700), include edge cases (empty, null, very long, unicode, whitespace).

Output: Harness where Test 1 (baseline) reproduces bug.

Step 4: Implement All 10 Solutions

Pattern (examples in Python, adapt to your language):

def test_1_baseline():
    """BASELINE: Current broken approach"""
    return Component(current_config).render(TEST_DATA)

def test_2_factory():
    """VARIATION: Factory vs constructor
    Hypothesis: Constructor skips registration"""
    return Factory.create(current_config).render(TEST_DATA)

Rules:

Each test is separate function (no shared state)
All use identical TEST_DATA
Only ONE thing varies per test
Comment: variation + hypothesis

⚠️ Gotcha 3: Global state pollution. Reset before/after each test:

def run_test(test_func):
    reset_globals()
    result = test_func(TEST_DATA)
    reset_globals()
    return result

Output: Complete harness with all 10 solutions.

Step 5: Test and Document Results

Classify:

✅ WORKS - Fixes bug, no side effects
⚠️ PARTIAL - Fixes but has issues
❌ FAILS - Doesn't fix
💥 BREAKS - Makes it worse

Document:

Test 3: EXPAND_TO_INCLUDE - ✅ WORKS
1. Observation: Horizontal scrollbar appears
2. Details: Width 11,189px, scrolls smoothly
3. Trade-offs: Need to calculate width in advance

Test 2: SET_MAX_WIDTH - ❌ FAILS
1. Why: Overridden by ScrollArea internally

Output: Documented results identifying winner(s).

Step 6: Analyze and Apply

Analysis (ask these 5 questions):

What's minimal difference between working/broken?
Why does this succeed where others fail?
What does this reveal about framework internals?
Are there trade-offs or edge cases?
Is this the simplest working approach?

Document pattern:

## Why Test 3 Works
What it does: [one sentence]
Why it works: [mechanism explanation]
Why others failed:
  - Test 2: [reason]
  - Test 7: [reason]
Trade-offs: [any limitations]

Apply to production:

Locate corresponding code
Apply minimal change (exact approach from working test)
Match execution context (order, data availability)
Test with production data
Add comments explaining WHY + failed alternatives

Comment template:

# Use [APPROACH] to fix [PROBLEM].
# Why alternatives failed:
# - set_max_width(): overridden by ScrollArea internally (egui docs v0.33)
# - min_scrolled_width(): doesn't adapt to content (GitHub issue #1234)
# Research: stackoverflow.com/a/789, framework docs at [URL]
# See tests/scroll_test for parallel approach comparison.

If fix works in test but fails in production:

Add logging to both (Python shown, use console.log/System.out/etc for your language):

print(f"Data: {len(data)} rows")
print(f"First: {data[0] if data else 'EMPTY'}")
print(f"Result: {result}")

Compare side-by-side:

TEST:       Data: 50 rows
PRODUCTION: Data: 0 rows    ← FOUND PROBLEM (data moved/consumed)

⚠️ Gotcha 4: Data availability mismatch. Symptom: suspiciously small values (100px vs 2000px). Fix: Calculate BEFORE any take()/move().

Debugging checklist:

⚠️ Anti-Pattern: Stopping at "it works". Do instead: Document WHY it works and WHY others failed.

Output: Working fix in production with clear analysis.

Critical Pitfalls (Consolidated)

Why This Works

Efficiency: Evaluates all approaches in single run (minutes vs hours/days). Fast iteration. Clear visual comparison eliminates guesswork.

Thoroughness: Systematic exploration discovers non-obvious solutions. Documents what doesn't work (valuable negative results).

Knowledge: Framework internals become clearer. Mental model improves. Future reference for similar issues.

ROI: Worthwhile when serial debugging would take >4 hours. Best for complex subsystems, multiple failed attempts, or when deep understanding needed.

Critical Pitfalls

Pitfall 1: Test Harness Doesn't Match Reality

Symptoms: Over-simplified test with

data=[1,2,3]

doesn't reproduce bug. Test passes, production fails. Different behavior in test vs production.

Causes:

Mock data instead of realistic
Skipped initialization
Different data types/threading/context
Framework version mismatch (test v2.0, production v1.8)
Async vs sync mismatch ("not initialized" errors)

Fix:

Copy EXACT framework version
Match: data types, threading, init sequence
Verify baseline test REPRODUCES bug
Profile production first, match scale
Include edge cases (empty, null, long, unicode)

Pitfall 2: Incomplete Documentation

Symptoms: Only documenting working solution. Future developers try failed approaches again.

Causes:

Not documenting WHY it works
Not listing failed alternatives
Not testing edge cases
Not explaining trade-offs
Not citing research sources (GitHub issues, Stack Overflow, papers)

Fix: Document pattern of failures with sources:

✅ expand_to_include_x() - WORKS (forces layout)
❌ set_max_width() - FAILS (overridden internally)
   Source: egui issue #1234 (github.com/emilk/egui/issues/1234)
❌ min_scrolled_width() - FAILS (doesn't adapt)
   Source: Stack Overflow answer by user123 (stackoverflow.com/a/789)

Pattern: Framework needs explicit expansion API, not hints.
References: See tests/scroll_test, egui docs (egui.rs/api/v0.33/ScrollArea)

Pitfall 3: Deleting Test Harness

Symptom: Fix works now, but breaks after framework update. Can't debug similar future bugs.

Fix: Commit harness to

tests/bug-fixes/YYYY-MM-DD-[name]/

with README explaining approaches tested, why winner worked, and citing all research sources (GitHub issues, Stack Overflow, papers, docs with URLs).

Value: Reference for similar bugs, verify fix survives updates, training material, academic integrity.

Quick Reference

10 Approach Categories

API Method - Constructor vs Factory.create() vs Builder
Init Order - setup_deps(); create() vs create(); setup_deps()
Explicit vs Implicit - size=1000 vs size=auto
Delegation - parent.layout(child) vs child.layout()
Data Structure - mutable list vs immutable tuple
Sync vs Async - blocking call vs await async_call()
Config - inline params vs load_config("file.json")
Nested vs Flat - deep.hierarchy.component vs flat_component
Direct vs Indirect - call() vs proxy.call()
Pattern - pull (get()) vs push (subscribe(listener))

Result Classification

✅ WORKS - Fixes bug completely, no side effects
⚠️ PARTIAL - Fixes but has performance/edge case issues
❌ FAILS - Doesn't fix the bug
💥 BREAKS - Worse (crashes, corrupts, regresses)

Common Failure Patterns

Symptom	Root Cause	Fix
Works in test, fails in production	Data empty (moved/taken)	Calculate before data consumption
Suspiciously small values (100 vs 2000)	Using empty data	Add logging, compare test vs prod
"Not initialized" errors	Async timing	Match async model, add delays
Order-dependent test results	Global state pollution	Reset state before/after each test
Different test/prod behavior	Version/context mismatch	Match framework version and execution context

Success Checklist

Before:

Bug reproducible and isolated
Root cause unclear (multiple possibilities)
Have 1-2 hours to invest

During:

8-10 orthogonal approaches (not parameter tweaks)
Harness builds <30 seconds
Baseline test reproduces bug
All tests use identical data
Representative data (match production scale + edge cases)

After:

Analyzed WHY winner works
Documented failure patterns with research sources
Minimal change to production
Comments explain WHY + list failed alternatives + cite sources
Verified with production data
Committed test harness to repo with README citing all research