Claude-skill-registry-data mechinterp-next-step-planner
Propose next mechanistic interpretability experiments based on research state, hypotheses, and existing evidence
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mechinterp-next-step-planner" ~/.claude/skills/majiayu000-claude-skill-registry-data-mechinterp-next-step-planner && rm -rf "$T"
manifest:
data/mechinterp-next-step-planner/SKILL.mdsource content
MechInterp Next-Step Planner
Analyze current research state and propose the next best experiments to run. Writes ExperimentSpec JSON files directly to the specs directory.
Purpose
The planner skill:
- Analyzes current hypotheses and their evidence status
- Identifies gaps in understanding
- Proposes 2-3 experiments that would discriminate between hypotheses
- Writes ExperimentSpec JSON files ready for the runner
When to Use
Use this skill when you need to:
- Decide what experiment to run next
- Find experiments that test uncertain hypotheses
- Generate specs after updating research state
- Plan a sequence of experiments
Decision Heuristics
The planner considers:
- Hypothesis uncertainty: Prioritize experiments for hypotheses with confidence near 0.5
- Evidence gaps: Look for hypotheses with few evidence items
- Validation needs: Suggest validation for high-confidence hypotheses (>0.7)
- ReLU floor avoidance: Skip weapon experiments if activation is low
- Constraint compliance: All specs enforce one-rung-per-family
- Suppressor detection: Suggest token_influence_sweep when single-family features may have hidden suppressors
Planning Workflow
1. Load Current State
from splatnlp.mechinterp.state import ResearchStateManager manager = ResearchStateManager(feature_id=18712, model_type="ultra") print(manager.get_summary())
2. Analyze Evidence Gaps
Check which hypotheses need more testing:
for h in manager.state.hypotheses: n_support = len(h.supporting_evidence) n_refute = len(h.refuting_evidence) print(f"{h.id}: conf={h.confidence:.0%}, support={n_support}, refute={n_refute}") if h.status == "proposed" and n_support == 0: print(f" -> NEEDS TESTING") elif h.confidence > 0.7 and h.status == "supported": print(f" -> NEEDS VALIDATION")
3. Generate Experiment Specs
Based on the analysis, create appropriate specs:
from splatnlp.mechinterp.schemas import ExperimentSpec, ExperimentType from splatnlp.mechinterp.state.io import get_spec_path, SPECS_DIR import json from datetime import datetime # Example: Family sweep for untested hypothesis about SCU spec = ExperimentSpec( id=datetime.now().strftime("%Y%m%d_%H%M%S"), type=ExperimentType.FAMILY_1D_SWEEP, feature_id=18712, model_type="ultra", variables={ "family": "special_charge_up", "rungs": [3, 12, 29, 41, 57], "include_absent": True }, dataset_slice={ "percentile_min": 10.0, "percentile_max": 90.0, "sample_size": 500 }, description="Test SCU response to identify threshold rung", parent_hypothesis="h001", rationale="Hypothesis h001 claims SCU >= 41 AP triggers feature. " "This sweep will identify the exact threshold." ) # Write spec to file spec_path = SPECS_DIR / spec.to_filename() spec_path.parent.mkdir(parents=True, exist_ok=True) spec_path.write_text(spec.model_dump_json(indent=2)) print(f"Spec written to: {spec_path}")
4. Common Experiment Templates
For Family-Specific Hypotheses
# "Feature responds to family X at high AP" ExperimentSpec( type=ExperimentType.FAMILY_1D_SWEEP, feature_id=feature_id, model_type="ultra", variables={"family": "swim_speed_up"}, )
For Multi-Family Interactions
# "SCU and ISS have synergistic effect" ExperimentSpec( type=ExperimentType.FAMILY_2D_HEATMAP, feature_id=feature_id, model_type="ultra", variables={ "family_x": "special_charge_up", "family_y": "ink_saver_sub" }, )
For Pattern Discovery
# "Feature detects specific ability combinations" ExperimentSpec( type=ExperimentType.FREQUENT_ITEMSETS, feature_id=feature_id, model_type="ultra", variables={ "min_support": 0.05, "max_size": 4, "high_activation_pct": 10.0 }, )
For Validation
# Validate stable high-confidence hypothesis ExperimentSpec( type=ExperimentType.SPLIT_HALF, feature_id=feature_id, model_type="ultra", variables={"n_splits": 10}, parent_hypothesis="h001", )
For Token Influence Analysis (Enhancers + Suppressors)
# Identify both enhancing and suppressing tokens # CRITICAL: Use this when you need to understand what a feature AVOIDS ExperimentSpec( type=ExperimentType.TOKEN_INFLUENCE_SWEEP, feature_id=feature_id, model_type="ultra", variables={ "min_samples": 50, "high_percentile": 0.995, # Top 0.5% "collapse_families": True, "suppressor_threshold": 0.8, # < 0.8 = suppressed "enhancer_threshold": 1.2, # > 1.2 = enhanced }, description="Identify enhancers and suppressors for complete interpretation", )
Planner Decision Tree
┌─────────────────────────────────────────┐ │ Load Research State │ └────────────────┬────────────────────────┘ │ ┌────────────▼────────────┐ │ Any hypotheses? │ └────────────┬────────────┘ │ No ◄────┴────► Yes │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────────────┐ │ Exploration │ │ Any untested? │ │ - PageRank │ └────────────┬────────────┘ │ - Itemsets │ │ │ - Family sweeps │ Yes ◄───┴───► No └─────────────────┘ │ │ ▼ ▼ ┌─────────────┐ ┌─────────────────┐ │ Test most │ │ Any uncertain? │ │ uncertain │ │ (conf ~0.5) │ └─────────────┘ └────────┬────────┘ │ Yes ◄───┴───► No │ │ ▼ ▼ ┌─────────────┐ ┌─────────────────┐ │ Discriminate│ │ Any high-conf │ │ experiments │ │ need validation?│ └─────────────┘ └────────┬────────┘ │ Yes ◄───┴───► No │ │ ▼ ▼ ┌─────────────┐ ┌───────────┐ │ Validation │ │ Research │ │ experiments │ │ complete! │ └─────────────┘ └───────────┘
Guardrails
The planner enforces these rules:
- One-rung-per-family: All specs include this constraint
- ReLU floor check: If state has pitfall "relu_floor_at_low_activation", avoid weapon sweeps
- Evidence diversification: Prefer different experiment types than recently run
- Hypothesis coverage: Ensure all active hypotheses have at least one pending experiment
- Suppressor check: For single-family features, suggest token_influence_sweep to detect what the feature AVOIDS (critical for complete interpretation)
- Conditional sweep requirement: After confirming primary driver with 1D sweep, ALWAYS suggest 2D heatmaps for secondary abilities (see below)
⚠️ CRITICAL: Conditional 2D Sweep Logic
After a 1D sweep confirms a primary driver, 1D sweeps for other abilities are MISLEADING.
Why 1D Sweeps Fail for Secondary Abilities
When feature has strong primary driver (e.g., SCU):
- Most contexts have LOW primary → activation ≈ 0
- Secondary ability can't suppress zero
- 1D sweep averages across ALL contexts → delta ≈ 0
- False conclusion: "secondary has no effect"
Planner Logic
def suggest_next_experiments(state): # Check if primary driver is confirmed primary_confirmed = any( h.status == "supported" and h.confidence > 0.7 for h in state.hypotheses if "primary" in h.tags or "monotonic" in h.tags ) if primary_confirmed: primary_family = get_confirmed_primary(state) correlated_abilities = get_overview_top_tokens(state, top_k=10) # For EACH correlated ability, suggest 2D heatmap for ability in correlated_abilities: if ability != primary_family: suggest_2d_heatmap( family_x=primary_family, family_y=ability, rationale=f"Test conditional effect of {ability} at each {primary_family} level" ) # ALWAYS suggest death-mitigation battery for death_ability in ["quick_respawn", "special_saver", "comeback"]: if not already_tested(state, primary_family, death_ability): suggest_2d_heatmap( family_x=primary_family, family_y=death_ability, rationale="Death-aversion test" )
Template: Post-Primary Conditional Sweeps
After confirming primary driver
{PRIMARY}, automatically suggest:
// Batch 1: Death-mitigation (REQUIRED) [ {"type": "family_2d_heatmap", "variables": {"family_x": "{PRIMARY}", "family_y": "quick_respawn"}}, {"type": "family_2d_heatmap", "variables": {"family_x": "{PRIMARY}", "family_y": "special_saver"}}, {"type": "family_2d_heatmap", "variables": {"family_x": "{PRIMARY}", "family_y": "comeback"}} ] // Batch 2: Top correlated from overview (if not already primary) [ {"type": "family_2d_heatmap", "variables": {"family_x": "{PRIMARY}", "family_y": "{SECONDARY_1}"}}, {"type": "family_2d_heatmap", "variables": {"family_x": "{PRIMARY}", "family_y": "{SECONDARY_2}"}} ] // Batch 3: Weapon interaction (optional) [ {"type": "weapon_sweep", "variables": {"condition_family": "{PRIMARY}", "condition_rung": 29}} ]
Interpreting 2D Results
| 2D Result | Meaning | Label Update |
|---|---|---|
| Y suppresses at high X | Anti-synergy | Add "{Y}-averse" |
| Y enhances at high X | Synergy | Add "{Y}-synergistic" |
| Y flat across X levels | Spurious correlation | Remove from label |
| No data at high X + high Y | Mutual exclusion | Note in label |
Output Location
Specs are written to:
/mnt/e/mechinterp_runs/specs/{timestamp}__f{feature_id}__{type}.json
Example filenames:
20250607_142531__f18712__family-1d-sweep.json20250607_143022__f18712__frequent-itemsets.json20250607_143500__f18712__split-half.json
See Also
- mechinterp-state: Manage research state
- mechinterp-runner: Execute generated specs
- mechinterp-glossary-and-constraints: Domain rules
- mechinterp-summarizer: Process results
- mechinterp-ability-semantics: Ability semantic groupings for interpretation
- mechinterp-investigator: Full investigation workflow