Claude-skill-registry-data mechinterp-investigator

Orchestrate a systematic research program to investigate and meaningfully label SAE features

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mechinterp-investigator" ~/.claude/skills/majiayu000-claude-skill-registry-data-mechinterp-investigator && rm -rf "$T"
manifest: data/mechinterp-investigator/SKILL.md
source content

MechInterp Investigator

This skill guides a systematic investigation of SAE features to arrive at meaningful, non-trivial labels. It orchestrates the other mechinterp skills into a coherent research workflow.

Phase 0: Triage (ALWAYS START HERE)

Goal: Quickly filter out weak/auxiliary features that don't warrant deep investigation.

Time: 1-2 minutes

Many SAE features have minimal influence on model outputs. Triage identifies these early so you can skip expensive analysis.

Step 0.1: Check Decoder Weight Percentile

import torch

sae_path = '/mnt/e/dev_spillover/SplatNLP/sae_runs/run_20250704_191557/sae_model_final.pth'
sae_checkpoint = torch.load(sae_path, map_location='cpu', weights_only=True)
decoder_weight = sae_checkpoint['decoder.weight']  # [512, 24576]

# Get this feature's max absolute decoder weight
feature_decoder = decoder_weight[:, FEATURE_ID]
max_abs = torch.abs(feature_decoder).max().item()

# Compare to all features
all_max_abs = torch.abs(decoder_weight).max(dim=0).values
percentile = (all_max_abs < max_abs).float().mean() * 100

print(f"Feature {FEATURE_ID} decoder weight percentile: {percentile:.1f}%")
PercentileAction
< 10%Likely weak - check overview structure
10-25%Borderline - overview decides
> 25%Proceed to Phase 1 (Overview)

Step 0.2: Quick Overview Check (if <10%)

If decoder percentile < 10%, run a quick overview:

poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {FEATURE_ID} --model ultra --top-k 10

Signs of clear structure (proceed to Phase 1):

  • One family dominates (>40% of breakdown)
  • Strong weapon concentration (>50% one weapon)
  • Clear binary ability pattern
  • Top PageRank token has score > 0.20

Signs of no structure (label as weak):

  • Family breakdown is flat (all <15%)
  • Weapons are diverse
  • Top PageRank score < 0.10
  • High sparsity (>99%) with no clear pattern

Triage Decision

Decoder percentile < 10% AND no clear structure in overview?
  │
  Yes → Label as "Weak/Aux Feature {ID}" and STOP
  │
  No → Proceed to Phase 1 (Overview)

Weak Feature Label Format

{
  "dashboard_name": "Weak/Aux Feature {ID}",
  "dashboard_category": "auxiliary",
  "dashboard_notes": "TRIAGE: Decoder weight {X}th percentile, no clear structure in overview. Skipped deep dive.",
  "hypothesis_confidence": 0.0,
  "source": "claude code (triage)"
}

When to Override Triage

Even with low decoder weights, proceed if:

  • The feature is part of a cluster you're investigating
  • You have external reason to believe it's important
  • You're doing exhaustive analysis of a subset

⚠️ Deep Dive Basics

A proper deep dive requires experiments, not just reading overview data. The overview shows correlations; experiments reveal causation.

Minimum Requirements for a Deep Dive

StepWhat to DoWhy
1. OverviewRun overview to see correlationsGenerate hypotheses
2. 1D SweepsTest top 3-5 families with 1D sweepsFind causal drivers (scaling abilities)
3. Binary CheckFor binary abilities (Comeback, Stealth Jump, LDE, Haunt, etc.), check presence rateBinary abilities show delta=0 in sweeps but may still be characteristic
4. Bottom TokensCheck suppressors from overviewWhat the feature AVOIDS is often more informative
5. 2D HeatmapsTest interactions between primary driver and correlated tokensVerify if correlations are causal or spurious
6. Kit AnalysisCheck if core weapons share sub/special/class patternCan explain "why" behind build philosophy - determine if causal or spurious

Binary Abilities Need Special Handling

Binary abilities (you have them or you don't) show delta=0 in 1D sweeps because there's no scaling. This does NOT mean they're unimportant.

Binary Abilities
Comeback, Stealth Jump, Last-Ditch Effort, Haunt, Ninja Squid, Respawn Punisher, Object Shredder, Drop Roller, Opening Gambit, Tenacity

To evaluate binary abilities:

  1. Check PageRank score (correlation strength)
  2. Check presence rate: What % of high-activation examples contain it?
  3. Compare mean activation WITH vs WITHOUT the binary token
  4. Run 2D heatmap:
    scaling_ability × binary_ability
    to see conditional effect

Binary Ability Analysis Protocol (CRITICAL)

Binary abilities can have strong conditional effects that ONLY show up in 2D analysis. Here's the exact methodology:

Step 1: Check presence rate enrichment

from splatnlp.mechinterp.skill_helpers import load_context
import polars as pl

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Find binary token ID
binary_id = None
for tok_id, tok_name in ctx.inv_vocab.items():
    if tok_name == 'comeback':  # or stealth_jump, etc.
        binary_id = tok_id
        break

# Calculate enrichment
threshold = df['activation'].quantile(0.90)  # Top 10%
high_df = df.filter(pl.col('activation') >= threshold)

with_binary_all = df.filter(pl.col('ability_input_tokens').list.contains(binary_id))
with_binary_high = high_df.filter(pl.col('ability_input_tokens').list.contains(binary_id))

baseline_rate = len(with_binary_all) / len(df)
high_rate = len(with_binary_high) / len(high_df)
enrichment = high_rate / baseline_rate

print(f"Baseline presence: {baseline_rate:.1%}")
print(f"High-activation presence: {high_rate:.1%}")
print(f"Enrichment ratio: {enrichment:.2f}x")
# Enrichment > 1.5x suggests binary ability is characteristic

Step 2: Check mean activation WITH vs WITHOUT

with_binary = df.filter(pl.col('ability_input_tokens').list.contains(binary_id))
without_binary = df.filter(~pl.col('ability_input_tokens').list.contains(binary_id))

mean_with = with_binary['activation'].mean()
mean_without = without_binary['activation'].mean()
delta = mean_with - mean_without

print(f"Mean WITH: {mean_with:.4f}")
print(f"Mean WITHOUT: {mean_without:.4f}")
print(f"Delta: {delta:+.4f}")
# Delta > 0.03 suggests meaningful effect

Step 3: Run 2D heatmap (MOST IMPORTANT)

Binary abilities can have conditional effects that vary by the scaling ability level:

# Manual 2D analysis for binary abilities
# (The built-in 2D heatmap may not handle binary tokens correctly)

scaling_ids = {3: 48, 6: 49, 12: 50, 21: 53, 29: 80}  # ISM example
binary_id = 27  # Comeback

print("Scaling | No Binary | With Binary | Delta")
print("-" * 50)

for level, tok_id in scaling_ids.items():
    level_df = df.filter(pl.col('ability_input_tokens').list.contains(tok_id))

    with_binary = level_df.filter(pl.col('ability_input_tokens').list.contains(binary_id))
    without_binary = level_df.filter(~pl.col('ability_input_tokens').list.contains(binary_id))

    mean_with = with_binary['activation'].mean() if len(with_binary) > 0 else 0
    mean_without = without_binary['activation'].mean() if len(without_binary) > 0 else 0
    delta = mean_with - mean_without

    print(f"{level:>7} | {mean_without:>9.4f} | {mean_with:>11.4f} | {delta:>+.4f}")

Example (Feature 13352):

ISM × Comeback 2D Analysis:
ISM | No CB  | With CB | Delta
  0 | 0.066  | 0.117   | +0.051
  3 | 0.122  | 0.261   | +0.139
  6 | 0.147  | 0.352   | +0.205  ← PEAK INTERACTION
 12 | 0.094  | 0.163   | +0.069
 21 | 0.094  | 0.129   | +0.035

Interpretation: Comeback has STRONG conditional effect at ISM 3-6.
The +0.205 delta at ISM_6 means Comeback DOUBLES the activation!
1D sweep showed delta=0 because most examples have ISM=0 (low baseline).

Step 4: Test combinations of binary abilities together

# Test multiple binary abilities together
binary_id_1 = 27  # e.g., comeback
binary_id_2 = 1   # e.g., stealth_jump

both = df.filter(
    pl.col('ability_input_tokens').list.contains(binary_id_1) &
    pl.col('ability_input_tokens').list.contains(binary_id_2)
)
neither = df.filter(
    ~pl.col('ability_input_tokens').list.contains(binary_id_1) &
    ~pl.col('ability_input_tokens').list.contains(binary_id_2)
)

# Then do 2D analysis at each scaling level
# Combinations can have stronger effects than individual abilities!

Key Insight: Binary abilities may have stronger effects when combined. Always test combinations, not just individual tokens.

Additional Learnings

  1. Conditional effects can be much stronger than marginal effects: A feature might show ISM with only 0.069 max_delta in 1D sweeps, but a binary ability combination at moderate ISM could produce +0.335 delta - the interaction effect can be 5x stronger than the marginal effect. 1D sweeps can dramatically underestimate a feature's true behavior.

  2. Depletion is informative: If a binary ability shows enrichment < 1.0 (e.g., 0.72x), the feature actively avoids that ability. This is meaningful for interpretation - it tells you what the feature excludes, not just what it includes.

  3. Manual 2D analysis required for binary tokens: The

    Family2DHeatmapRunner
    uses
    parse_token()
    which expects
    family_name_AP
    format, but binary abilities appear as just the token name (e.g.,
    comeback
    not
    comeback_10
    ). Use manual 2D analysis code for binary abilities (see protocol above).

  4. "Weak feature" needs decoder weight check: A feature with weak activation effects (max_delta < 0.03) might still have high influence on outputs. Remember: net influence = activation strength × decoder weight. Before labeling as "weak", check the feature's decoder weights to the output tokens it contributes to. A "weak activation" feature with high decoder weights may actually be important.

  5. Watch for error-correction features: If 1D sweeps show small deltas or effects only in unusual rung combinations, the feature may fire when prerequisites are MISSING (OOD detection). Test "explains-away" behavior by comparing activation when low-level evidence is present vs missing. Example: Does feature fire MORE when SCU_3 is absent from a high-SCU build?

  6. Beware of flanderization in top activations: The top 100 activations over-emphasize extreme cases. The TRUE concept often lives in the mid-activation range (25-75th percentile). Always compare mid vs top activation regions - if they show different weapon/ability patterns, label the mid-range concept and note the extremes as "super-stimuli".

What Counts as Evidence

Evidence TypeStrengthExample
1D sweep max_delta > 0.05Strong causal"ISM drives this feature"
1D sweep max_delta 0.02-0.05Weak causal"ISM has minor effect"
1D sweep max_delta < 0.02Negligible"ISM doesn't drive this"
Binary delta = 0InconclusiveNeed presence rate check
High PageRank + low deltaSpurious correlationToken co-occurs but doesn't cause
2D heatmap shows conditional effectInteraction confirmed"X matters only when Y is high"
Bottom tokens (suppressors)Avoidance pattern"Feature avoids death-perks"
Higher activation when prerequisite MISSINGError-correction"Fires on OOD rung combos"
Mid-range (25-75%) differs from topFlanderization"Top is super-stimuli; label mid-range"

Common Mistakes to Avoid

  1. Presenting overview as findings - Overview is hypotheses, not conclusions
  2. Ignoring binary abilities - Delta=0 doesn't mean unimportant
  3. Skipping bottom tokens - Suppressors reveal what feature avoids
  4. Only running 1D sweeps - 2D heatmaps needed for interaction effects
  5. Not checking weapon patterns - Feature may be weapon-specific, not ability-specific
  6. Using only top activations - Top activations (90%+ of max) may be "flanderized" extremes; check core region (25-75% of max)
  7. Missing error-correction features - Small deltas in weird rung combos may indicate OOD detection
  8. Confusing data sparsity with suppression - Zero examples at a condition ≠ "suppression to 0" (see below)
  9. Shallow validation - Just checking if numbers "look right" without running enrichment analysis
  10. Semantic contradictions in labels - e.g., "Zombie" (embraces death) + "high SSU" (avoids death) is contradictory
  11. Reporting weapon percentages from top-100 - Use top 20-30% instead; top-100 can be 5-10x off (e.g., 78% vs 10%)
  12. Not checking meta archetypes - Weapons may cluster by playstyle, not kit; use splatoon3-meta skill
  13. Assuming kit-based patterns - Check if weapons share sub/special BEFORE assuming it's kit-related
  14. Ignoring flanderization crossover - Note where a "super-stimulus" weapon overtakes the general pattern (usually 90%+ of max activation)

⚠️ CRITICAL: Data Sparsity vs Suppression

This is a common and dangerous mistake. When you see "activation = 0" or "no effect" at some condition, ask: Is this suppression or data sparsity?

Example of the mistake (Feature 1819):

Original claim: "QR is HARD SUPPRESSOR - SSU_57+QR_any=0.000"
Reality: There were ZERO examples with SSU_57 + any QR in the dataset!
         The "0.000" was missing data, not suppression.

How to detect data sparsity:

# ALWAYS check sample sizes when claiming suppression!
at_high_ssu = df.filter(pl.col('ability_input_tokens').list.contains(ssu_57_id))
with_qr = at_high_ssu.filter(pl.col('ability_input_tokens').list.set_intersection(qr_ids).list.len() > 0)

print(f"Examples at SSU_57 with QR: {len(with_qr)}")  # If 0, this is SPARSITY not suppression!

Rule: Never claim "suppression" unless you have ≥20 examples in the suppressed condition. Report sample sizes with all claims.

Philosophy

A meaningful label should capture:

  • What concept the feature encodes (not just "detects token X")
  • Why the model might have learned this representation
  • How it relates to strategic/tactical gameplay

Avoid trivial labels like:

  • "SCU Detector" (just describes token presence)
  • "High activation feature" (describes statistics, not meaning)

Aim for interpretable labels like:

  • "Aggressive Slayer Build" (strategic concept)
  • "Special Spam Enabler" (functional role)
  • "Backline Support Kit" (playstyle archetype)

Investigation Workflow

Phase 0: Triage

See Phase 0: Triage above. Always start here.

If feature passes triage (decoder weight ≥10% OR has clear structure), proceed to Phase 1.

Phase 1: Initial Assessment

Run the overview and classify the feature type:

poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {FEATURE_ID} --model {MODEL} --top-k 20

Classify based on family breakdown:

PatternTypeNext Steps
One family >40%Single-familyCheck for interference, weapon specificity
Top 2-3 families ~20% eachMulti-familyCheck synergy/redundancy, build archetype
Many families <15% eachDistributedLook for meta-pattern, weapon class
Weapons concentratedWeapon-specificWeapon sweep, class analysis

CRITICAL: Always check for non-monotonic effects! Higher AP doesn't always mean higher activation.

Phase 1.5: Activation Region Analysis (CRITICAL - Anti-Flanderization)

Don't only examine extreme activations! High activations may be "flanderized" - exaggerated, extreme versions of the true concept that over-emphasize niche cases.

Key insight: The TRUE concept often lives in the core region (25-75% of effective max), not the top examples. Top activations (90%+ of effective max) can mislead you into labeling a niche pattern instead of the general concept.

Why "effective max"? Activation distributions are heavy-tailed. Using

effective_max = 99.5th percentile of nonzero activations
prevents single outliers from making the core region nearly empty.

Run activation region analysis:

from splatnlp.mechinterp.skill_helpers import load_context
import numpy as np
from collections import Counter

ctx = load_context("{MODEL}")
df = ctx.db.get_all_feature_activations_for_pagerank({FEATURE_ID})

acts = df['activation'].to_numpy()
weapons = df['weapon_id'].to_list()

# Use EFFECTIVE MAX (99.5th percentile) to handle heavy-tailed distributions
# This prevents single outliers from making the core region nearly empty
nonzero_acts = acts[acts > 0]
effective_max = np.percentile(nonzero_acts, 99.5)
true_max = acts.max()
print(f"True max: {true_max:.4f}, Effective max (99.5%ile): {effective_max:.4f}")

# Define activation regions as % of EFFECTIVE max
regions = [
    ('Floor (≤1%)', lambda a: a <= 0.01 * effective_max),
    ('Low (1-10%)', lambda a: 0.01 * effective_max < a <= 0.10 * effective_max),
    ('Below Core (10-25%)', lambda a: 0.10 * effective_max < a <= 0.25 * effective_max),
    ('Core (25-75%) - TRUE CONCEPT', lambda a: 0.25 * effective_max < a <= 0.75 * effective_max),
    ('High (75-90%)', lambda a: 0.75 * effective_max < a <= 0.90 * effective_max),
    ('Flanderization Zone (90%+)', lambda a: a > 0.90 * effective_max),
]

for region_name, filter_fn in regions:
    indices = [i for i, a in enumerate(acts) if filter_fn(a)]
    weps = [weapons[i] for i in indices]
    print(f"\n{region_name} (n={len(indices)}):")
    for wep, count in Counter(weps).most_common(5):
        name = ctx.id_to_weapon_display_name(wep)
        print(f"  {name}: {count}")

Key signals to look for:

PatternInterpretation
Same weapons in ALL regionsGeneral concept (continuous feature)
Different weapons in core vs 90%+Super-stimuli detected
Diverse weapons in core, concentrated in 90%+True concept is in core region
Niche weapons only in 90%+High activations are "flanderized" extremes

Example (Feature 9971):

Core (25-75%): Splattershot (115), Wellstring (65), Sploosh (57)...
Flanderization (90%+): Bloblobber (44), Glooga Deco (39), Range Blaster (28)

Interpretation: Core region shows GENERAL offensive investment.
Flanderization zone shows EXTREME SCU on special-dependent weapons (super-stimuli).
Label the general concept, note the super-stimuli pattern.

CRITICAL: Always check the Bottom Tokens (Suppressors) section! Tokens that rarely appear in high-activation examples can reveal what the feature avoids:

Suppressor PatternInterpretation
Death-mitigation (QR, SS, CB) suppressedFeature avoids "death-accepting" builds
Defensive (IR, SR) suppressedFeature prefers aggressive/ranged builds
Mobility suppressedFeature prefers stationary/positional play
Special abilities suppressedFeature encodes non-special playstyle

Example: If SCU is enhanced but

quick_respawn
,
special_saver
, and
comeback
are ALL suppressed, the feature doesn't just detect "SCU" - it detects "death-averse SCU builds" (players who stack SCU but don't plan to die).

Phase 1.6: Weapon Distribution Analysis (CRITICAL - Anti-Flanderization)

NEVER report weapon percentages from top-100 samples. Top-100 is severely flanderized and can give wildly misleading weapon distributions.

Example (Feature 14096 - Real Case):

Top 100:     Dark Tetra 78%, Stamper 20%  ← WRONG, flanderized
Top 10%:     Stamper 35%, Dark Tetra 21%  ← Better but still skewed
Top 30%:     Stamper 23%, Dark Tetra 10%  ← TRUE CONCEPT
Full dataset: Stamper 9%, Dark Tetra 3.5% ← Includes noise/floor

Use top 20-30% for weapon characterization:

import polars as pl
import numpy as np
from collections import Counter
from splatnlp.mechinterp.skill_helpers import load_context

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Get percentile thresholds
acts = df['activation'].to_numpy()
thresholds = {p: np.percentile(acts, p) for p in [0, 50, 70, 80, 90, 95, 99]}

# Analyze by region
regions = [
    ("Bottom 50% (noise)", 0, 50),
    ("50-70% (weak)", 50, 70),
    ("Top 30% (TRUE CONCEPT)", 70, 100),
    ("Top 10%", 90, 100),
    ("Top 1% (flanderized)", 99, 100),
]

print("Region | Top Weapons")
print("-" * 60)

for name, p_low, p_high in regions:
    t_low, t_high = thresholds[p_low], thresholds.get(p_high, float('inf'))
    if p_high == 100:
        region_df = df.filter(pl.col('activation') >= t_low)
    else:
        region_df = df.filter((pl.col('activation') >= t_low) & (pl.col('activation') < t_high))

    if len(region_df) == 0:
        continue

    weapon_counts = region_df.group_by('weapon_id').agg(
        pl.col('activation').count().alias('n')
    ).sort('n', descending=True)

    top3 = []
    for row in weapon_counts.head(3).iter_rows(named=True):
        wname = ctx.id_to_weapon_display_name(row['weapon_id'])
        pct = row['n'] / len(region_df) * 100
        top3.append(f"{wname[:12]}({pct:.0f}%)")

    print(f"{name:<25} | {', '.join(top3)}")

Interpretation Guide:

PatternMeaning
Same weapons in top-30% and top-1%Continuous feature, no flanderization
Different weapons in top-30% vs top-1%Flanderization detected - label top-30% concept
One weapon jumps from 10% to 70%+That weapon is "super-stimulus" for the feature
Weapons consistent 50%→30%→10%→1%Stable feature, safe to use any region

Rule: Report weapon percentages from top 20-30%, note if top-1% differs significantly.

Phase 1.6.5: Ability Flanderization Check (CRITICAL)

The same flanderization that applies to weapons applies to abilities. A binary ability with high tail enrichment but low core coverage is a super-stimulus, not the core concept.

The Rule: If a "dominant" driver has <30% core coverage, it's a tail marker, not the headline concept.

Use the core coverage experiment:

cd /root/dev/SplatNLP

# Direct subcommand (recommended)
poetry run python -m splatnlp.mechinterp.cli.runner_cli coverage \
    --feature-id {FEATURE_ID} --model ultra \
    --tokens respawn_punisher,comeback,stealth_jump \
    --threshold 0.30

Output tables:

  • token_coverage
    : Shows core_coverage_pct, tail_enrichment, is_tail_marker for each token
  • weapon_coverage
    : Shows core vs tail weapon distributions (catches weapon flanderization)

Coverage Interpretation:

Core CoverageInterpretationLabel Implication
>50%Primary driverSafe to headline
30-50%Significant but not universalMention in notes, not headline
<30%Tail marker / super-stimulusNOT the headline concept

Example (Feature 13934):

respawn_punisher: 8.57x tail enrichment, BUT only 12% core coverage
→ RP is a super-stimulus, NOT the core concept
→ Wrong label: "RP Backline Anchor"
→ Right approach: Split core by RP presence to reveal hidden modes

When you find a super-stimulus (<30% coverage):

  1. Split the core by presence/absence of the super-stimulus
  2. Analyze both modes separately
  3. Look for what they have in COMMON (the true concept)
  4. Label the commonality, note the super-stimulus as a tail marker

Phase 1.7: Meta-Informed Weapon Analysis (USE AFTER WEAPON SWEEP)

After identifying top weapons, always check if they match a known meta archetype using the

splatoon3-meta
skill.

Step 1: Look up weapon kits

Check

references/weapons.md
for each top weapon's sub and special:

# Top weapons from Feature 14096 (top 30%):
kits = {
    "Splatana Stamper": ("Burst Bomb", "Zipcaster"),
    "Dark Tetra Dualies": ("Autobomb", "Reefslider"),
    "Glooga Dualies": ("Splash Wall", "Booyah Bomb"),
    "Dapple Dualies Nouveau": ("Torpedo", "Reefslider"),
    "Splatana Wiper": ("Torpedo", "Ultra Stamp"),
}

# Check for shared subs/specials
from collections import Counter
subs = Counter(k[0] for k in kits.values())
specials = Counter(k[1] for k in kits.values())

# If one sub/special dominates → kit-based feature
# If diverse → playstyle-based feature

Step 2: Check archetype reference

Read

references/archetypes.md
to see if weapons match a known archetype:

ArchetypeKey WeaponsSignature Abilities
Zombie SlayerTetra Dualies, Splatana WiperQR + Comeback + Stealth Jump
Stealth SlayerCarbon Roller, InkbrushNinja Squid + SSU + Stealth Jump
Anchor/BacklineE-liter, Hydra SplatlingRespawn Punisher + Object Shredder
Support/BeaconSquid Beakon weaponsSub Power Up + ISS + Comeback

Step 3: Classification decision

Kit Analysis Result:
├─ Shared sub weapon? → Feature may encode SUB PLAYSTYLE
├─ Shared special? → Feature may encode SPECIAL FARMING
├─ No kit pattern + archetype match? → PLAYSTYLE FEATURE (label as archetype)
└─ No kit pattern + no archetype? → WEAPON CLASS feature (check if all dualies, all shooters, etc.)

Example (Feature 14096):

Top 30% weapons: Stamper, Dark Tetra, Glooga, Dapple, Wiper
Kit analysis: Diverse subs (Burst, Auto, Splash Wall, Torpedo), diverse specials
Archetype check: Dark Tetra + Splatana Wiper = "Zombie Slayer" archetype!
Conclusion: PLAYSTYLE feature encoding Zombie Slayer (death-accepting aggressive)
Label: "Zombie Slayer QR (Splatana/Dualies)" - tactical category

When to invoke splatoon3-meta skill:

  • After weapon_sweep shows concentrated weapon pattern
  • When top weapons seem unrelated by kit but share a playstyle
  • To validate that ability patterns match expected meta builds
  • To identify if weapons share archetype despite different kits

Phase 1.7.5: Kit Component Analysis (OPTIONAL but Recommended)

When to use: After weapon sweep, check if the core weapons share patterns in ANY kit component: sub weapon, special weapon, or main weapon class. This can reveal WHY certain build philosophies emerge.

Key insight: Weapons may cluster by:

  • Sub weapon (Burst Bomb users, Beakon users → explains SPU/ISS builds)
  • Special weapon (Aggressive push specials → explains survival builds)
  • Main weapon class (All dualies, all chargers → explains mobility/positioning builds)

The feature may be driven by ONE of these - identify which, then determine if it's causal or spurious.


Component 1: Sub Weapon Pattern Analysis

When relevant: If kit_sweep (Phase 1.7/3d) shows sub concentration, investigate further.

from collections import Counter

# Map top weapons to their subs (from weapons.md)
weapon_subs = {
    "Splattershot Jr.": "Splat Bomb",
    "Neo Splash-o-matic": "Suction Bomb",
    "Sploosh-o-matic 7": "Splat Bomb",
    # ... add more as needed
}

# Categorize subs
sub_categories = {
    # Lethal bombs
    "Splat Bomb": "lethal", "Suction Bomb": "lethal", "Burst Bomb": "lethal",
    "Curling Bomb": "lethal", "Autobomb": "lethal", "Torpedo": "lethal",
    "Fizzy Bomb": "lethal", "Ink Mine": "lethal", "Toxic Mist": "lethal",
    # Utility/Support
    "Squid Beakon": "utility", "Splash Wall": "utility", "Sprinkler": "utility",
    "Point Sensor": "utility", "Angle Shooter": "utility",
}

# Count categories
sub_counts = Counter()
for weapon in top_weapons:
    sub = weapon_subs.get(weapon)
    if sub:
        category = sub_categories.get(sub, "other")
        sub_counts[category] += 1

print("Sub Weapon Breakdown:")
for sub, count in Counter(weapon_subs.get(w) for w in top_weapons if weapon_subs.get(w)).most_common():
    print(f"  {sub}: {count}")

Sub pattern implications:

Sub PatternBuild ImplicationExample
Shared BeakonsSPU/ISS focus for sub spamBeacon Support builds
Shared Burst BombMobility + burst damageAggressive flanker builds
Shared Splash WallPositional/defensive playLane control builds
Diverse subsSub is NOT the clustering factorCheck special or main class

Component 2: Special Weapon Pattern Analysis

When relevant: After weapon sweep, check if core weapons share a special weapon pattern.

from collections import Counter

# Map top weapons to their specials (from weapons.md)
weapon_specials = {
    "Splatana Stamper": "Zipcaster",
    "Sloshing Machine": "Booyah Bomb",
    "Squeezer": "Trizooka",
    # ... add more as needed
}

# Categorize specials
special_categories = {
    # Zoning/Area Denial
    "Ink Storm": "zoning", "Wave Breaker": "zoning", "Tenta Missiles": "zoning",
    "Killer Wail 5.1": "zoning", "Triple Inkstrike": "zoning",
    # Team Support
    "Tacticooler": "team_support", "Big Bubbler": "team_support",
    "Splattercolor Screen": "team_support",
    # Aggression/Push
    "Trizooka": "aggression", "Crab Tank": "aggression", "Ink Jet": "aggression",
    "Ultra Stamp": "aggression", "Booyah Bomb": "aggression", "Reefslider": "aggression",
    "Kraken Royale": "aggression", "Zipcaster": "aggression",
    # Utility/Defense
    "Ink Vac": "utility", "Super Chump": "utility", "Triple Splashdown": "utility",
}

# Count categories
category_counts = Counter()
for weapon in top_weapons:
    special = weapon_specials.get(weapon)
    if special:
        category = special_categories.get(special, "other")
        category_counts[category] += 1

print("Special Category Breakdown:")
for cat, count in category_counts.most_common():
    print(f"  {cat}: {count/sum(category_counts.values())*100:.0f}%")

Special pattern implications:

Special PatternBuild ImplicationExample
>60% aggressionPlayers build for survival to deploy push specialsFeature 14964
>60% zoningPlayers may invest in SCU/SPU for area denial uptimeInk Storm spam
>50% team_supportTeam-oriented builds, may see Tenacity/CBSupport kit
Diverse specialsSpecial is NOT the clustering factorCheck sub or main class

Component 3: Main Weapon Class Pattern Analysis

When relevant: If weapons seem diverse but may share a class (all shooters, all dualies, all chargers).

# Weapon class mapping (from weapon-vibes.md)
weapon_classes = {
    "Splattershot": "shooter", "Splattershot Jr.": "shooter", "Splattershot Pro": "shooter",
    "Dark Tetra Dualies": "dualie", "Dapple Dualies": "dualie", "Splat Dualies": "dualie",
    "E-liter 4K": "charger", "Splat Charger": "charger", "Goo Tuber": "charger",
    "Luna Blaster": "blaster", "Range Blaster": "blaster", "Rapid Blaster": "blaster",
    "Hydra Splatling": "splatling", "Mini Splatling": "splatling",
    "Splatana Stamper": "splatana", "Splatana Wiper": "splatana",
    # ... add more as needed
}

# Count classes
class_counts = Counter(weapon_classes.get(w, "other") for w in top_weapons)

print("Weapon Class Breakdown:")
for cls, count in class_counts.most_common():
    pct = count / len(top_weapons) * 100
    print(f"  {cls}: {pct:.0f}%")

Class pattern implications:

Class PatternBuild ImplicationExample
>60% dualiesMobility-focused, dodge-roll buildsSSU + QSJ synergy
>60% chargersPositioning, low death toleranceAnchor builds
>60% blastersBurst damage, trade-happyQR + Comeback synergy
>60% splatlingsCharge management, lane holdingISM + positioning
Diverse classesClass is NOT the clustering factorCheck sub or special

Step 4: Determine if Pattern is CAUSAL or SPURIOUS

This is the critical step. A strong pattern in ANY component could be causal or spurious.

Pattern TypeEvidenceImplication
CAUSALKit component explains build philosophyInclude in label rationale
SPURIOUSWeapons share other traits that better explain clusteringDon't emphasize that component

Questions to determine causality:

  1. Does the kit component align with decoder output?

    • Decoder promotes SCU/SS/SPU + aggressive specials → Special farming is likely causal
    • Decoder promotes ISS/SPU + shared sub weapon → Sub spam is likely causal
    • Decoder promotes SSU/QSJ + all dualies → Weapon class mobility is likely causal
  2. Do weapons share OTHER traits that better explain the clustering?

    • All dualies with aggressive specials → Is it the CLASS or the SPECIAL?
    • Test: Do other dualies (without aggressive specials) also cluster here?
  3. Does the build philosophy make sense for this kit component?

    • Survival builds + aggressive specials → "Stay alive to use push special" (causal)
    • Mobility builds + all dualies → "Dualies need SSU for dodge-roll play" (causal)
    • Survival builds + diverse subs/specials + all chargers → "Chargers can't trade" (class is causal)

Example Analysis (Special-driven):

Feature 14964 special breakdown: 77% aggression (Zipcaster, Booyah Bomb, Trizooka)
Build philosophy: "Balanced utility spread for survival"

Analysis:
- Decoder suppresses death-trading (Comeback, RP) ✓
- Decoder promotes survival abilities (SS, ISM) ✓
- Weapons have LOW-MED death tolerance ✓
- Weapons have aggressive push specials ✓
- Sub weapons are DIVERSE (no pattern)
- Weapon classes are DIVERSE (shooters, slosher, splatana)

Conclusion: CAUSAL - Players build for survival BECAUSE they have aggressive specials
           that require staying alive to deploy effectively.

Note: "Core weapons have aggressive push specials (77%) requiring survival to deploy"

Example Analysis (Class-driven):

Feature shows: 80% dualies (Dark Tetra, Dapple, Dualie Squelchers)
Decoder promotes: SSU, QSJ, RSU (mobility family)

Analysis:
- Specials are DIVERSE (not the driver)
- Subs are DIVERSE (not the driver)
- All weapons are DUALIES with dodge-roll mechanics ✓
- Dualies benefit uniquely from SSU for roll distance/recovery

Conclusion: CAUSAL - Dualies cluster because dodge-roll playstyle needs mobility
           The feature encodes "dualie mobility optimization"

Counter-example (Spurious):

Feature has 70% aggression specials
But: All weapons are CLOSE-range SLAYER with HIGH death tolerance
And: Decoder promotes QR, Comeback (death-trading)

Conclusion: SPURIOUS - Weapons are aggressive slayers who happen to have aggressive specials
           The special type is incidental to the slayer playstyle.
           Primary driver is ROLE (slayer), not KIT.

Step 5: Record findings in notes

If pattern is CAUSAL, add to dashboard_notes:

KIT PATTERN: {component} - {X}% {category/type} ({list top examples}).
INTERPRETATION: [Why this explains the build philosophy]

If pattern is SPURIOUS, note briefly:

KIT PATTERN: Diverse/incidental. Weapons cluster by [range/role/playstyle], not kit.

When to skip this phase:

  • Feature is clearly mechanical (single ability stacker like "SCU_57 threshold")
  • Weapons are highly diverse with no concentration in any component
  • Earlier analysis already identified clear driver (e.g., single weapon dominance)

Phase 1.8: Weapon Range/Role Classification (REQUIRED for Labels)

Before proposing any label, you MUST classify the feature's weapons by range and role. This prevents incorrect role assumptions (e.g., calling Jr./Rapid Blasters "anchors" when they're midrange).

Step 1: Extract properties for top 5-10 core weapons from weapon-vibes.md

PropertyValuesLabel Implication
RANGECLOSE, MID, LONG, SNIPERDetermines qualifier
LANEFRONT, MID, BACK, FLEXConfirms positioning
JOBSLAYER, SUPPORT, ANCHOR, SKIRMISH, ASSASSINDetermines role word
NS_FITCORE, GOOD, MEH, BAD, NOStealth vs visible
DEATH_TOLHIGH, MED, LOWTrading vs survival

Step 2: Find the common pattern

If most weapons share:

  • LONG/SNIPER + BACK + ANCHOR → use "Anchor" or "Backline" qualifier
  • MID/LONG + MID + SKIRMISH/SUPPORT → use "Midrange" qualifier
  • CLOSE/MID + FRONT + SLAYER → use "Slayer" or "Frontline" qualifier
  • NO/BAD NS_FIT + LOW DEATH_TOL → "Visible" or "Positional" concept (not stealth, not trading)

Step 3: Record in notes

Always include weapon classification in dashboard_notes:

WEAPON ROLE: Midrange (MID-LONG range, SKIRMISH/SUPPORT jobs, NO/BAD NS fit, LOW death tolerance)

Phase 2: Hypothesis Generation

Based on Phase 1, generate hypotheses about what the feature might encode:

For single-family dominated features:

  • H1: Pure token detector (trivial - try to disprove)
  • H2: Threshold detector (activates only at high AP)
  • H3: Interaction detector (family + something else)
  • H4: Weapon-conditional (family matters only for certain weapons)

For multi-family features:

  • H1: Synergy detector (families work together)
  • H2: Build archetype (strategic loadout pattern)
  • H3: Playstyle indicator (aggressive, defensive, support)
  • H4: Shared NEED (different builds solving the same tactical problem)

Build NEED Framework (For Multi-Modal/Diffuse Features)

When a feature activates on seemingly different build types, ask: "What NEED do these builds share?"

Features can encode solutions to problems, not just correlations. Different builds may trigger the same feature because they're different answers to the same question.

Step 1: Identify the tactical constraint these builds solve

QuestionExample
What gameplay problem do these builds address?"How to handle death for low-death-tolerance weapons"
What enemy behavior are they countering?"Dealing with aggressive flankers"
What win condition are they enabling?"Special pressure" or "Map control"

Step 2: Check weapon properties (use splatoon3-meta)

Compare enriched weapons on these axes from

weapon-vibes.md
:

  • Ink feel: STARVING / HUNGRY / AVERAGE / EFFICIENT / PAINTER
  • Range: MELEE / CLOSE / MID / LONG / SNIPER
  • Ninja Squid affinity: CORE / GOOD / MEH / BAD / NO
  • Death tolerance: HIGH / MED / LOW
  • Role: SLAYER / SUPPORT / ANCHOR / SKIRMISH / ASSASSIN

If all enriched weapons share properties (e.g., all HUNGRY ink + NO ninja squid + LOW death tolerance), the feature may encode a need specific to that weapon class.

Step 3: Reframe the modes as "answers to the same question"

Example (Feature 13934):

Mode A (12%): RP anchor builds (E-liter) - "I won't die, make their deaths hurt"
Mode B (88%): Zombie utility builds (DS) - "I will die sometimes, optimize respawns"

Shared NEED: "Death management for non-stealth, low-death-tolerance, midrange+ weapons"
Both modes are VALID ANSWERS to the same tactical question.

Step 4: Label the NEED, not the modes

Instead of: "Mixed: Zombie + RP Anchor" (describes the modes) Label as: "Balanced Utility Axis (Non-Stealth Midline+)" (describes the need)

Key Insight: The model learned that these seemingly different builds share a common requirement. The feature encodes that requirement, and the modes are just different implementations.

For weapon-specific features:

  • H1: Weapon class pattern (all shooters, all chargers, etc.)
  • H2: Meta build (optimal loadout for that weapon)
  • H3: Weapon-ability interaction

Phase 3: Targeted Experiments

Run experiments to test hypotheses. Available experiment types:

TypePurpose
family_1d_sweep
Activation across AP rungs for one family
family_2d_heatmap
Interaction between two families
within_family_interference
Detect error correction within a family
weapon_sweep
Activation by weapon (optionally conditioned on family)
weapon_group_analysis
Compare high vs low activation by weapon
pairwise_interactions
Synergy/redundancy between tokens
token_influence_sweep
Identify enhancers and suppressors across all tokens

⚠️ CRITICAL: Iterative Conditional Testing Protocol

1D sweeps can be MISLEADING for secondary abilities. When a feature has a strong primary driver:

The Problem

1D sweep for secondary ability (e.g., QR) across ALL contexts might show delta ≈ 0

Why this happens:

  • Most contexts have LOW primary driver (e.g., low SCU) → activation already near zero
  • Secondary ability can't suppress what's already zero
  • The few high-primary contexts get drowned out in the average

Example (Feature 18712):

QR 1D sweep (all contexts): mean_delta = -0.0006 → "QR has no effect" ❌ WRONG!
SCU × QR 2D heatmap:
  - At SCU_15: QR_0=0.13, QR_12=0.04 → QR suppresses 70%! ✅
  - At SCU_29: QR_0=0.15, QR_12=0.04 → QR suppresses 74%! ✅

The Solution: Iterative 2D Testing

Protocol for features with a strong primary driver:

1. Confirm primary driver with 1D sweep
   └─ If monotonic response confirmed → proceed to step 2

2. For EACH correlated ability in overview (top 5-10):
   └─ Run 2D heatmap: PRIMARY × SECONDARY
   └─ Check activation at EACH primary level
   └─ Look for:
      - Suppression: secondary reduces activation at high primary
      - Synergy: secondary boosts activation at high primary
      - Spurious: no conditional effect (correlation was coincidence)

3. Group findings by semantic category:
   └─ Death-mitigation (QR, SS, CB): all suppress? → "death-averse"
   └─ Mobility (SSU, RSU): all enhance? → "mobility-synergistic"
   └─ Efficiency (ISM, ISS): mixed? → test individually

2D Heatmap Interpretation Guide

PatternInterpretation
Peak at (high_X, 0_Y)Y is a suppressor
Peak at (high_X, high_Y)Y is a synergy
Flat across Y at each XY has no conditional effect (spurious)
Non-monotonic in X at some YInterference pattern

Heatmap Cell Validity Check

Before drawing conclusions from heatmap cells, check the cell metadata:

Each cell in heatmap output includes:

  • n
    : Number of valid samples in this cell
  • std
    : Standard deviation of activations
  • stderr
    : Standard error (std / sqrt(n)) - new field
n (samples)Interpretation
null/0Impossible combination (constraint violation) - don't interpret
1-4Very weak evidence - note uncertainty in conclusions
5-20Moderate evidence - interpret with caution
20+Strong evidence - interpret confidently

High stderr (>0.1) indicates high variance - the mean may not be reliable.

Anti-patterns to avoid:

  • Drawing conclusions from cells with n < 5
  • Claiming "peak at X=57, Y=29" when that cell has n=2
  • Ignoring null cells (they represent impossible ability combinations)

Example interpretation:

Cell (ISM=51, IRU=29): mean=0.35, n=3, stderr=0.08
→ "ISM=51 with IRU=29 shows high activation, but n=3 means this could be noise"

Cell (ISM=51, IRU=0): mean=0.35, n=45, stderr=0.02
→ "ISM=51 without IRU shows reliable high activation (n=45)"

When to Use 2D vs 1D

ScenarioUse 1DUse 2D
Testing primary driver-
Testing secondary abilities❌ MISLEADING✅ REQUIRED
Looking for interactions-
Confirming suppressor hypothesis-
Quick initial scan✅ (with caution)-

Template: Death-Aversion Test Battery

For single-family dominated features, always test death-mitigation:

# Test 1: Primary × Quick Respawn
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {ID} --family-x {PRIMARY} --family-y quick_respawn \
    --rungs-x 0,6,15,29,41,57 --rungs-y 0,6,12,21,29

# Test 2: Primary × Special Saver
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {ID} --family-x {PRIMARY} --family-y special_saver \
    --rungs-x 0,6,15,29,41,57 --rungs-y 0,3,6,12,21

# Test 3: Primary × Comeback (binary ability - use binary subcommand for this)
poetry run python -m splatnlp.mechinterp.cli.runner_cli binary \
    --feature-id {ID} --model ultra

If ALL three show suppression at Y>0, label includes "death-averse"

Template: Error-Correction Detection

If 1D sweeps show small deltas or effects only in unusual rung combinations, test for error-correction behavior:

import polars as pl
from splatnlp.mechinterp.skill_helpers import load_context

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Get token IDs for high and low rungs
# Example: SCU_57 (high) and SCU_3 (low)
high_rung_id = ctx.vocab['special_charge_up_57']
low_rung_id = ctx.vocab['special_charge_up_3']

# Compare activation when low rung is present vs missing (among high-rung builds)
high_with_low = df.filter(
    pl.col('ability_input_tokens').list.contains(high_rung_id) &
    pl.col('ability_input_tokens').list.contains(low_rung_id)
)
high_without_low = df.filter(
    pl.col('ability_input_tokens').list.contains(high_rung_id) &
    ~pl.col('ability_input_tokens').list.contains(low_rung_id)
)

mean_with = high_with_low['activation'].mean()
mean_without = high_without_low['activation'].mean()

print(f"High rung WITH low rung present: {mean_with:.4f} (n={len(high_with_low)})")
print(f"High rung WITHOUT low rung: {mean_without:.4f} (n={len(high_without_low)})")
print(f"Delta: {mean_without - mean_with:+.4f}")

# If WITHOUT > WITH, feature fires when prerequisite is MISSING = error correction!

Signs of error-correction:

PatternInterpretationLabel Style
Higher activation when low rung MISSING"Explains away" missing evidence"Error-Correction: {FAMILY}"
Only fires on weird rung combosOOD detector"OOD Detector: {PATTERN}"
Negative interactions in 2D heatmapsWithin-family interference"Interference Feature: {FAMILY}"

Test for within-family interference (CRITICAL for single-family):

poetry run python -m splatnlp.mechinterp.cli.runner_cli family-sweep \
    --feature-id {FEATURE_ID} --family {FAMILY} --model {MODEL}
# Check for non-monotonic response patterns in the output

Test for interactions (2D heatmap):

poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {FEATURE_ID} --family-x {FAMILY_A} --family-y {FAMILY_B} --model {MODEL}

Test for weapon specificity:

poetry run python -m splatnlp.mechinterp.cli.runner_cli weapon-sweep \
    --feature-id {FEATURE_ID} --model {MODEL} --top-k 20 --min-examples 10

CHECKPOINT: After weapon_sweep, check for dominant weapon pattern:

If weapon_sweep diagnostics show "DOMINANT WEAPON" warning (one weapon has >2x delta of second):

  1. Run kit_sweep to analyze by sub weapon and special weapon:
poetry run python -m splatnlp.mechinterp.cli.runner_cli kit-sweep \
    --feature-id {FEATURE_ID} --model {MODEL} --top-k 10 --analyze-combinations
  1. Use splatoon3-meta skill to look up the dominant weapon's kit:

    • Read
      .claude/skills/splatoon3-meta/references/weapons.md
    • Find the weapon's sub weapon and special weapon
  2. Cross-reference other high-activation weapons:

    • Do they share the same sub weapon?
    • Do they share the same special weapon?
    • If yes, the feature may encode kit behavior not weapon behavior
  3. Update hypothesis based on findings:

    • If shared sub: Feature may encode sub weapon playstyle
    • If shared special: Feature may encode special spam/farming
    • If no kit pattern: Feature is truly weapon-specific

Example: Feature 18712 shows Octobrush Nouveau dominant. Kit lookup reveals Squid Beakon + Ink Storm. Other high weapons (Rapid Blaster, Range Blaster) also have "special-dependent" characteristics per meta → Feature encodes "SCU for Ink Storm spam" not just "Octobrush".

Test for threshold effects:

  • Compare low-rung vs high-rung responses
  • Look for non-linear jumps in activation
  • Check if certain rungs REDUCE activation (interference)

Phase 4: Synthesis

Combine findings into a coherent interpretation:

  1. What triggers activation? (tokens, combinations, weapons)
  2. Is there structure beyond simple detection? (interactions, thresholds)
  3. What gameplay concept does this represent?
  4. Why would the model learn this? (predictive value for recommendations)

Phase 5: Label Proposal

Propose a label at the appropriate level:

ComplexityLabel TypeExample
TrivialToken detector"SCU Presence" (avoid if possible)
SimpleThreshold detector"High SCU Investment (29+ AP)"
ModerateInteraction"SCU + Mobility Combo"
StrategicBuild archetype"Special Spam Slayer Kit"
TacticalPlaystyle"Aggressive Frontline Build"

Label Specificity by Category

The label's specificity should match its concept level:

CategorySpecificityStyleExamples
mechanicalTerseToken-focused, technical"SCU Threshold 29+", "ISM Stacker"
tacticalMid-levelAbility combos, weapon synergies"Zombie Slayer Dualies", "Beacon Support Kit"
strategicHigh-conceptPlaystyle, gameplay philosophy"Positional Survival - Midrange", "Aggressive Reentry"

Why this matters:

  • Mechanical features encode low-level patterns → label should be precise and technical
  • Tactical features encode build strategies → label should name the strategy
  • Strategic features encode gameplay philosophies → label should capture the "why"

Examples by level:

Feature encodes "SCU above 29 AP threshold"
→ Category: mechanical
→ Label: "SCU Threshold 29+" (terse, specific)

Feature encodes "QR + Comeback + Stealth Jump on dualies"
→ Category: tactical
→ Label: "Zombie Slayer Dualies" (names the combo + weapon)

Feature encodes "survive through positioning, not stealth or trading"
→ Category: strategic
→ Label: "Positional Survival - Midrange" (high-concept + role)

Strategic Label Quality Checklist

Before finalizing a label, verify:

  1. Concept over tokens: Does the label describe a GAMEPLAY CONCEPT, not just list abilities?

    • BAD: "SSU + ISM + SRU Kit", "Swim Efficiency Kit"
    • GOOD: "Positional Survival", "Aggressive Reentry"
  2. Positive framing: Does the label describe what the feature IS, not just what it avoids?

    • BAD: "Death-Averse Efficiency", "Anti-Stealth Build"
    • GOOD: "Positional Survival", "Visible Zone Control"
  3. The "why" test: Can you answer "why would a player build this?"

    • If answer is "to have SSU and ISM" → label is too mechanical
    • If answer is "to survive through positioning at midrange" → label captures concept
  4. Range/role qualifier: Have you verified weapon range (Phase 1.8) and added appropriate qualifier?

    • Backline (SNIPER/LONG + ANCHOR) → "- Anchor" or "- Backline"
    • Midrange (MID/LONG + SUPPORT/SKIRMISH) → "- Midrange"
    • Frontline (CLOSE/MID + SLAYER) → "- Slayer" or "- Frontline"

Strategic Label Format

Prefer: "[Concept] - [Qualifier]"

Concept ExamplesWhat it captures
Positional SurvivalStay alive through positioning, not stealth/trading
Aggressive ReentryPressure through fast respawn (zombie)
Stealth ApproachWin through concealment (NS builds)
Special PressureWin through special uptime
Lane PersistenceHold lanes through sustain
Qualifier ExamplesWhen to use
MidrangeMID-range weapons, SKIRMISH/SUPPORT jobs
AnchorLONG/SNIPER range, ANCHOR job, chargers/splatlings
SlayerCLOSE/MID range, SLAYER job, aggressive weapons
SupportSUPPORT job, team utility focus
(Weapon Class)When specific to dualies, blasters, etc.

Label Anti-Patterns to Avoid

Anti-PatternExampleWhy It's BadBetter Label
Token listing"SSU + ISM Kit"Describes tokens, not purpose"Positional Survival"
Negation-only"Death-Averse"Describes avoidance, not identity"Positional Survival"
Wrong role"Anchor" for Jr./RapidAnchor implies backline chargers"- Midrange"
Too generic"Utility Build"Could mean anything"Positional Survival - Midrange"
FlanderizedBased on top 100 onlyCaptures tail, not core conceptCheck core region first

Phase 6: Deeper Dive (For Thorny Features)

When to use: If the standard deep dive (Phases 1-5) didn't produce a clear interpretation:

  • All scaling effects weak (max_delta < 0.03)
  • No clear primary driver
  • Conflicting signals from different experiments
  • Feature seems important (high contribution to outputs) but unclear why

The Deeper Dive uses the hypothesis/state management system for systematic exploration:

Step 1: Initialize Research State

from splatnlp.mechinterp.state import ResearchState, Hypothesis

state = ResearchState(feature_id=FEATURE_ID, model_type="ultra")

# Add competing hypotheses based on what you've observed
state.add_hypothesis(Hypothesis(
    id="h1",
    description="Feature encodes weapon-specific pattern for Dapple Nouveau",
    status="pending"
))
state.add_hypothesis(Hypothesis(
    id="h2",
    description="Feature encodes binary ability package (Stealth + Comeback)",
    status="pending"
))
state.add_hypothesis(Hypothesis(
    id="h3",
    description="Feature has high decoder weights despite weak activation effects",
    status="pending"
))

Step 2: Check Decoder Weights

For "weak activation" features, check if they have high influence via decoder weights:

# Load SAE decoder weights
import torch
sae_path = '/mnt/e/dev_spillover/SplatNLP/sae_runs/run_20250704_191557/sae_model_final.pth'
sae_checkpoint = torch.load(sae_path, map_location='cpu', weights_only=True)
decoder_weight = sae_checkpoint['decoder.weight']  # [512, 24576]

# Get this feature's decoder weights to output space
feature_decoder = decoder_weight[:, FEATURE_ID]  # [512]

# Check magnitude
print(f"Decoder weight L2 norm: {torch.norm(feature_decoder):.4f}")
print(f"Max absolute weight: {torch.abs(feature_decoder).max():.4f}")

# Compare to other features
all_norms = torch.norm(decoder_weight, dim=0)
percentile = (all_norms < torch.norm(feature_decoder)).float().mean() * 100
print(f"Percentile among all features: {percentile:.1f}%")

If decoder weights are high (>75th percentile), the feature may be important despite weak activation effects.

Step 3: Decoder Output Analysis (CRITICAL for Diffuse Features)

When activation analysis doesn't yield a clean interpretation, analyze what the feature RECOMMENDS.

This technique asks: "What does this feature push the model to predict?" rather than "What activates this feature?"

Use the decoder CLI:

cd /root/dev/SplatNLP

# Quick output influence check
poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \
    --feature-id {FEATURE_ID} \
    --model ultra \
    --top-k 15

# Check decoder weight importance
poetry run python -m splatnlp.mechinterp.cli.decoder_cli weight-percentile \
    --feature-id {FEATURE_ID} \
    --model ultra

See mechinterp-decoder skill for full documentation.

Interpretation Guide:

Output PatternInterpretation
Promotes low-AP tokens (_3, _6)"Recommend light investment"
Promotes high-AP tokens (_51, _57)"Recommend heavy stacking"
Suppresses high-AP tokens"Anti-stacking / balanced build"
Promotes death-mitigation (QR, CB, SS)"Recommend zombie/respawn optimization"
Suppresses death-mitigation"Death-averse / stay alive"

Example (Feature 13934):

PROMOTES: respawn_punisher (+0.23), comeback (+0.16), QSJ_6 (+0.15), IA_3 (+0.14), ISM_6 (+0.13)
SUPPRESSES: RSU_57 (-0.30), QR_57 (-0.25), RSU_51 (-0.24)

Interpretation: Feature recommends "balanced utility spread with low-AP investments"
               and DISCOURAGES heavy stacking of any single ability.

When to use decoder output analysis:

  • Activation analysis shows multi-modal or diffuse patterns
  • No single signature covers >50% of core
  • Feature seems "confused" between different build types
  • You want to understand the feature's PURPOSE, not just what triggers it

Key Insight: A feature can activate on seemingly different builds because they share the same NEED. The output analysis reveals what the feature is recommending, which may unify apparently contradictory activation patterns.

Decoder Output Semantic Grouping (CRITICAL for Labels)

After running decoder output analysis, group promoted/suppressed tokens by MEANING, not just family:

Semantic GroupToken FamiliesGameplay Meaning
MobilitySSU, RSUHow you reposition
SurvivalBRU, IRU, RES, QR, SS, RPHow you stay alive
EfficiencyISM, ISS, IRUHow you sustain pressure
LethalityIA, MPU, BPU (bomb damage)How you get kills
Special-FocusSCU, SS, SPU, TenacityHow you use specials
StealthNS, (high SSU)How you approach unseen
Death-TradingQR, CB, SJ, SSHow you weaponize respawn

Abbreviation Key:

  • SSU = Swim Speed Up, RSU = Run Speed Up
  • BRU = Bomb (Sub) Resistance Up, RES = Ink Resistance Up
  • IRU = Ink Recovery Up, ISM = Ink Saver Main, ISS = Ink Saver Sub
  • BPU = Bomb (Sub) Power Up, SPU = Special Power Up
  • SCU = Special Charge Up, SS = Special Saver
  • QR = Quick Respawn, CB = Comeback, SJ = Stealth Jump
  • IA = Intensify Action, MPU = Main Power Up, NS = Ninja Squid, RP = Respawn Punisher

Then ask: "What COMBINATION of groups defines this feature?"

Promoted GroupsSuppressed GroupsStrategic Concept
Mobility + Survival + EfficiencyDeath-Trading, StealthPositional Survival
Death-Trading + MobilitySurvivalZombie/Aggressive Reentry
Stealth + Mobility-Stealth Approach
Special-Focus + EfficiencyMobilitySpecial Farming
Lethality + MobilityEfficiencyAggressive Slayer

This semantic grouping directly informs the strategic label.

Post-Decoder Sweep Rule

After decoder output analysis, verify the top promoted/suppressed families with causal 1D sweeps.

The decoder tells you what the feature RECOMMENDS, but not whether it's causally driven by those tokens. To validate:

  1. Identify top 2 promoted families from decoder output (highest positive contributions)
  2. Identify top 2 suppressed families from decoder output (most negative contributions)
  3. Run 1D sweeps for any not yet tested in Phase 2
Decoder ShowsTest WithExpected If Valid
BRU highly promoted
family_1d_sweep
BRU
Positive delta with BRU levels
RSU suppressed
family_1d_sweep
RSU
Negative delta or flat

Example: Feature 10938 decoder showed BRU heavily promoted (+0.126, +0.120, +0.108 for different rungs), but initial sweeps only tested SSU/ISM. Should have run:

# Missing sweep that would validate decoder findings
poetry run python -m splatnlp.mechinterp.cli.runner_cli run-spec \
    --spec '{"type": "family_1d_sweep", "variables": {"family": "bomb_resistance_up"}}' \
    --feature-id 10938 --model ultra

Anti-pattern: Trusting decoder output without causal validation. Decoder weights show correlation to output tokens, not causal effect of input tokens.

Step 4: Run Targeted Experiments

Based on hypotheses, run specific tests:

# Log experiments and findings to state
state.add_evidence(
    hypothesis_id="h1",
    experiment_type="weapon_sweep",
    finding="37% Dapple Nouveau, but also 10% .96 Gal Deco - not single-weapon",
    supports=False
)

state.add_evidence(
    hypothesis_id="h3",
    experiment_type="decoder_weight_check",
    finding="Decoder L2 norm: 0.89 (92nd percentile) - HIGH despite weak activation",
    supports=True
)

Step 5: Synthesize

# Review all evidence
state.summarize()

# Update hypothesis statuses
state.update_hypothesis("h1", status="rejected")
state.update_hypothesis("h3", status="supported")

# Propose final interpretation
state.set_conclusion(
    "Feature has weak activation effects but high decoder weights. "
    "It acts as a 'fine-tuning' feature that makes small but important "
    "adjustments to output probabilities."
)

When Deeper Dive is Complete

The state object provides an audit trail of:

  • What hypotheses were considered
  • What experiments were run
  • What evidence was found
  • Why the final interpretation was chosen

This is useful for:

  • Revisiting the feature later
  • Explaining the interpretation to others
  • Identifying if new evidence should change the interpretation

Decision Trees

Single-Family Dominated Feature

1. Run within_family_interference to check for error correction
   └─ If interference found → "Error-Correcting {FAMILY} Detector"
   └─ If enhancement patterns → "{FAMILY} Stacker (synergistic)"
   └─ If neutral → continue

2. Check for non-monotonic 1D response
   └─ If drops at certain rungs → investigate interference
   └─ If monotonic with threshold → "High {FAMILY} Investment"
   └─ If monotonic with no threshold → probably trivial

3. Run weapon_sweep to check weapon specificity
   └─ If weapon-concentrated → run weapon_group_analysis
   └─ If weapon-specific patterns → "{WEAPON_CLASS} + {FAMILY}"

4. Run 2D sweep with second-ranked family
   └─ If interaction effect → "{FAMILY_A} + {FAMILY_B} Combo"
   └─ If no interaction → try third family

5. If all trivial → label as "{FAMILY} Stacker" with note "simple detector"

Multi-Family Feature

1. Check if families are related
   └─ All mobility (SSU, RSU, QSJ) → "Mobility Kit"
   └─ All ink efficiency (ISM, ISS, IRU) → "Efficiency Kit"
   └─ Mixed → continue

2. Run pairwise interaction analysis
   └─ Positive synergy → "Synergistic Build"
   └─ Redundancy → "Alternative Paths"

3. Check weapon breakdown
   └─ Weapon class pattern → "{CLASS} Optimal Build"

4. Consider strategic meaning
   └─ What playstyle does this combination enable?

Example Investigation

Feature 18712 (Deep Analysis):

  1. Overview: SCU 31%, SSU 11%, ISS 10% → Single-family dominated
  2. Hypothesis: Could be SCU + something, or just trivial SCU detector
  3. 2D Heatmap (SCU × SSU): Peak at SCU=57, SSU=0. Non-monotonic drops visible!
    • SCU 6→12: DROP of 0.02 (unexpected)
    • SCU 15→21: DROP of 0.01
  4. Interference Analysis:
    • SCU_12 REDUCES SCU_51 signal by 0.10 (interference!)
    • SCU_15 ENHANCES SCU_51 signal by 0.12 (synergy!)
  5. Weapon Analysis: Effect varies by weapon
    • weapon_id_50: SCU_3 reduces SCU_15 (-0.08)
    • weapon_id_7020: SCU_3 enhances SCU_15 (+0.03)
  6. Interpretation: Feature detects "clean" high-SCU builds.
    • Low rungs (SCU_3, SCU_12) can contaminate the signal
    • Effect is weapon-dependent
  7. Label: "SCU Purity Detector (weapon-conditional)" - NOT trivial!

Key Insight: What looked like a simple "SCU detector" actually encodes complex error-correction behavior. Always check for interference!

Commands Summary

# Phase 1: Overview (with extended analyses)
poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {ID} --model ultra --top-k 20

# Phase 1 with extended analyses (enrichment, regions, binary, kit)
poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {ID} --model ultra --all

# Phase 3a: 1D sweep for dominant family (direct subcommand)
poetry run python -m splatnlp.mechinterp.cli.runner_cli family-sweep \
    --feature-id {ID} --family {FAMILY} --model ultra

# Phase 3b: 2D heatmap for interactions (direct subcommand)
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {ID} --family-x {FAMILY_A} --family-y {FAMILY_B} --model ultra

# Phase 3c: Weapon sweep (direct subcommand)
poetry run python -m splatnlp.mechinterp.cli.runner_cli weapon-sweep \
    --feature-id {ID} --model ultra --top-k 20

# Phase 3d: Kit sweep (if dominant weapon detected)
poetry run python -m splatnlp.mechinterp.cli.runner_cli kit-sweep \
    --feature-id {ID} --model ultra --analyze-combinations

# Phase 3e: Binary ability analysis
poetry run python -m splatnlp.mechinterp.cli.runner_cli binary \
    --feature-id {ID} --model ultra

# Phase 3f: Core coverage analysis
poetry run python -m splatnlp.mechinterp.cli.runner_cli coverage \
    --feature-id {ID} --tokens {TOKEN1},{TOKEN2}

# Phase 1.7.5: Kit Component Analysis (see skill for full code)
# After weapon sweep, check for patterns in: sub weapons, specials, or weapon class
# For any concentrated pattern, determine if CAUSAL (explains build) or SPURIOUS (incidental)

# Phase 5: Set label
poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \
    --feature-id {ID} --name "{LABEL}" --category {tactical|strategic|mechanical}

Labeling Categories

  • mechanical: Low-level patterns (token presence, simple combinations)
  • tactical: Mid-level patterns (build synergies, weapon kits)
  • strategic: High-level patterns (playstyles, meta concepts)

See Also

  • mechinterp-overview: Initial feature assessment (now includes bottom tokens)
  • mechinterp-runner: Execute experiments (includes
    core_coverage_analysis
    and
    decoder_output_analysis
    )
  • mechinterp-decoder: Decoder weight analysis - what features recommend (USE for diffuse/heterogeneous features)
  • mechinterp-next-step-planner: Generate experiment specs
  • mechinterp-labeler: Save labels
  • mechinterp-glossary-and-constraints: Domain reference
  • mechinterp-ability-semantics: Ability semantic groupings (check AFTER hypotheses)
  • splatoon3-meta: Weapon archetypes, kit lookups, meta knowledge (USE for weapon pattern interpretation)