Awesome-Agent-Skills-for-Empirical-Research workflows:compound

Document a recently solved research problem to compound methodological knowledge

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/11-James-Traina-compound-science/skills/workflows-compound" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-workflows-compoun && rm -rf "$T"

manifest: skills/11-James-Traina-compound-science/skills/workflows-compound/SKILL.md

source content

/compound

Pipeline mode: This command operates fully autonomously. All decisions are made automatically.

Coordinate multiple subagents working in parallel to document a recently solved research problem. Creates structured documentation in

docs/solutions/

with YAML frontmatter for searchability and future reference.

Purpose

Captures problem solutions while context is fresh. Uses parallel subagents for maximum efficiency — Phase 1 gathers information, Phase 2 assembles the final document.

Why "compound"? Each documented solution compounds your methodological knowledge. The first time you solve a convergence problem takes hours of research. Document it, and the next occurrence takes minutes. Knowledge compounds.

Usage

/workflows:compound                          # Document the most recent fix
/workflows:compound convergence failure in BLP inner loop  # Provide context
/workflows:compound fixed cluster-robust SEs  # Brief description

Execution Strategy: Two-Phase Orchestration

<critical_requirement> Only ONE file gets written — the final documentation.

Phase 1 subagents return TEXT DATA to the orchestrator. They must NOT use Write, Edit, or create any files. Only the orchestrator (Phase 2) writes the final documentation file. </critical_requirement>

Phase 1: Parallel Research

<parallel_tasks>

Launch these subagents IN PARALLEL. Each returns text data to the orchestrator.

1. Context Analyzer

Extracts conversation history for the problem-solving session
Identifies problem type, estimation method, symptoms, error messages
Auto-categorizes the problem (see Category Classification below)
Returns: YAML frontmatter skeleton with problem metadata

2. Solution Extractor

Analyzes all investigation steps taken during the session
Identifies root cause (e.g., "ill-conditioned Hessian due to poor starting values")
Extracts working solution with code examples
Documents what didn't work and why (important for future reference)
Returns: Solution content block with code snippets

3. Related Docs Finder

Searches
```
docs/solutions/
```
for related documentation
Identifies cross-references and links to similar problems
Checks if this problem is a variant of a previously documented issue
Returns: Links, relationships, and duplicate-avoidance notes

4. Prevention Strategist

Develops prevention strategies specific to the problem type
Creates diagnostic checklist ("check these things first next time")
Suggests robustness checks or tests that would catch this early
Returns: Prevention/diagnostic content

5. Category Classifier

Auto-detects the appropriate
```
docs/solutions/
```
category from problem description and session content
Validates category against the schema below
Generates filename slug from problem description
Returns: Final path and filename

</parallel_tasks>

Category Classification

Problems are auto-classified into one or more categories using keyword matching on the problem description and session content:

Category	Directory	Keywords / Signals
Estimation Issues	`estimation-issues/`	convergence, bias, efficiency, standard errors, MLE, GMM, likelihood, optimizer, starting values, boundary, gradient, Hessian
Data Issues	`data-issues/`	missing data, measurement error, sample selection, merge, duplicates, outliers, panel structure, encoding, cleaning
Numerical Issues	`numerical-issues/`	floating-point, overflow, underflow, condition number, tolerance, ill-conditioning, precision, NaN, Inf, singular matrix
Methodology Issues	`methodology-issues/`	identification, model specification, assumption violations, endogeneity, exclusion restriction, functional form, overidentification
Derivation Issues	`derivation-issues/`	proof, theorem, lemma, asymptotic, regularity conditions, existence, uniqueness, fixed point, convergence rate
Replication Issues	`replication-issues/`	reproducibility, package versions, seeds, environment, Docker, conda, renv, pipeline, Makefile, DVC

Multi-category problems: A problem can belong to multiple categories (e.g., "BLP convergence failure" is both

estimation-issues/

and

numerical-issues/

). Use the primary category for the file location and cross-reference the secondary category in the frontmatter

tags

field.

Ambiguous problems: If keyword matching is inconclusive, default to

methodology-issues/

(the broadest category).

Phase 2: Assembly & Write

<sequential_tasks>

WAIT for all Phase 1 subagents to complete before proceeding.

The orchestrating agent performs these steps:

Collect all text results from Phase 1 subagents
Assemble complete markdown file using the template below
Validate YAML frontmatter fields are complete
Create directory if needed:
```
mkdir -p docs/solutions/[category]/
```
Write the SINGLE final file:
```
docs/solutions/[category]/[filename].md
```

Documentation Template

---
title: "[Problem title — concise, searchable]"
date: YYYY-MM-DD
category: [primary category]
tags: [estimation, convergence, BLP, ...]
estimation_method: [if applicable: MLE, GMM, IV, DiD, ...]
language: [Python, R, Julia, Stata]
severity: [critical, moderate, minor]
time_to_resolve: [approximate time spent]
---

# [Problem Title]

## Problem

**Symptom:** [What was observed — error messages, wrong results, failure to converge]

**Context:** [What estimation/analysis was being performed, what data, what method]

**Reproduction:** [Minimal steps to reproduce the problem]

## Investigation

### What Didn't Work
1. [Attempted fix and why it failed]
2. [Another attempt and outcome]

### Root Cause
[Technical explanation of why the problem occurred]

## Solution

[Step-by-step fix with code examples]

```python
# or R, Julia, Stata as appropriate
# Working code with comments explaining the fix

Prevention

Diagnostic Checklist (check these first next time):

[First thing to verify]
[Second thing to verify]
[Third thing to verify]

Robustness Checks:

[Checks that would catch this early]

[Links to related docs/solutions/ entries]
[Links to methodology papers if relevant]
[Links to package documentation]


</sequential_tasks>

### Phase 3: Specialized Agent Review (Optional)

**WAIT for Phase 2 to complete before proceeding.**

Based on the problem category, optionally invoke a specialized agent to review the documentation for accuracy and completeness:

<parallel_tasks>

| Problem Category | Agent | Review Focus |
|-----------------|-------|-------------|
| `estimation-issues/` | `econometric-reviewer` | Solution correctness, estimation theory |
| `numerical-issues/` | `numerical-auditor` | Numerical accuracy, stability claims |
| `methodology-issues/` | `methods-explorer` | Methodological completeness, alternatives |
| `derivation-issues/` | `mathematical-prover` | Proof correctness, regularity conditions |
| `data-issues/` | `data-detective` | Data handling best practices |
| `replication-issues/` | `reproducibility-auditor` | Reproducibility completeness |

Launch the matching agent(s) to verify the documented solution is correct and complete. If the agent finds issues, update the documentation file.

</parallel_tasks>

### Phase 4: Knowledge Capture

Use `references/solution-schema.md` to ensure the solution is properly indexed and cross-referenced:

- Verify the documentation follows the YAML frontmatter schema
- Cross-reference with existing docs/solutions/ entries
- Detect patterns (3+ similar issues) and promote critical patterns
- Update any index or search metadata

## What It Captures

- **Problem symptom**: Exact error messages, observable behavior, numerical output
- **Investigation steps tried**: What didn't work and why (often the most valuable part)
- **Root cause analysis**: Technical explanation grounded in estimation/statistical theory
- **Working solution**: Step-by-step fix with complete, runnable code examples
- **Prevention strategies**: Diagnostic checklists and robustness checks
- **Cross-references**: Links to related solutions, methodology papers, package docs

## What It Creates

**Organized documentation:**

- File: `docs/solutions/[category]/[filename].md`
- Categories auto-detected from problem description

**Category directories:**

- `estimation-issues/` — convergence failures, biased estimates, wrong standard errors
- `data-issues/` — missing data, merge errors, sample selection problems
- `numerical-issues/` — floating-point, ill-conditioning, overflow/underflow
- `methodology-issues/` — identification failures, specification errors, assumption violations
- `derivation-issues/` — proof errors, incorrect asymptotics, missing regularity conditions
- `replication-issues/` — reproducibility failures, environment issues, missing documentation

## Common Mistakes to Avoid

| Wrong | Correct |
|-------|---------|
| Subagents write files like `context-analysis.md` | Subagents return text data; orchestrator writes one final file |
| Research and assembly run in parallel | Research completes, THEN assembly runs |
| Multiple files created during workflow | Single file: `docs/solutions/[category]/[filename].md` |
| Generic description ("fixed the bug") | Specific description ("replaced numerical Hessian with analytic Hessian to fix ill-conditioning in BLP inner loop") |
| Solution without code | Always include runnable code showing the fix |
| Skipping "what didn't work" | Failed approaches are the most valuable for future reference |

## Success Output

Documentation complete

Subagent Results: Context Analyzer: Identified numerical_issue in BLP inner loop Solution Extractor: Analytic Hessian fix with code example Related Docs Finder: 1 related entry (estimation-issues/blp-starting-values.md) Prevention Strategist: 4-item diagnostic checklist Category Classifier: numerical-issues/

Specialized Review: numerical-auditor: Verified solution correctness, confirmed condition number improvement

Knowledge Capture: Solution schema: Indexed and cross-referenced

File created: docs/solutions/numerical-issues/blp-inner-loop-hessian-conditioning.md

This solution will be searchable for future reference when similar numerical stability issues occur in BLP estimation.

Next steps:

Continue workflow — run /workflows:review for remaining changes
Link related documentation if other solutions should reference this
Run /workflows:work if additional implementation is needed


## The Compounding Philosophy

This creates a compounding knowledge system for research methodology:

1. First time you solve "ill-conditioned Hessian in BLP" → Research and debug (2 hours)
2. Document the solution → `docs/solutions/numerical-issues/blp-hessian.md` (5 min)
3. Next time similar issue occurs → Quick lookup via `docs/solutions/` search (2 min)
4. Knowledge compounds → Research team gets faster at diagnosing and fixing estimation problems

The feedback loop:

Specify Model → Estimate → Diagnose Issue → Research → Fix → Document → Validate ↑ ↓ └───────────────────────────────────────────────────────────────────────┘


**Each unit of methodological problem-solving should make subsequent units easier — not harder.**

## Auto-Invoke

This command is most valuable immediately after solving a non-trivial problem. Trigger phrases:

- "that converges now"
- "fixed the estimation"
- "identification argument is complete"
- "pipeline runs end-to-end"
- "replication package works"

Or invoke directly: `/workflows:compound [brief context]`

### Phase 5: Handoff

**Pipeline mode** (when invoked from /lfg or /slfg):
- This is the final step in the chain. Report completion and stop.
- Output: "Workflow complete. Solution documented at [path]."

**Standalone mode** (when invoked directly by the user):
- After the documentation is written, present options:
  1. **Start next cycle** — Run `/workflows:brainstorm` to begin a new research cycle
  2. **Review the documentation** — Read back the docs/solutions/ file just created
  3. **End session** — Stop here; the solution is documented

This closes the loop: compound → brainstorm → plan → work → review → compound.

## Applicable Specialized Agents

Based on problem category, these agents enhance and verify documentation:

### Estimation & Methodology
- **econometric-reviewer**: Reviews estimation-issue solutions for theoretical correctness
- **methods-explorer**: Enriches with alternative approaches and literature references
- **identification-critic**: Verifies identification-related solutions are complete

### Numerical & Data
- **numerical-auditor**: Validates numerical stability claims and solutions
- **data-detective**: Reviews data-issue solutions for completeness

### Rigor & Reproducibility
- **mathematical-prover**: Checks derivation-issue solutions for proof correctness
- **reproducibility-auditor**: Validates replication-issue solutions

### Configuration
Customize which review agents run by editing `compound-science.local.md`.