Claude-skill-registry-data mechinterp-decoder
Analyze SAE decoder weights - output influence, feature importance, and decoder similarity
git clone https://github.com/majiayu000/claude-skill-registry-data
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mechinterp-decoder" ~/.claude/skills/majiayu000-claude-skill-registry-data-mechinterp-decoder && rm -rf "$T"
data/mechinterp-decoder/SKILL.mdMechInterp Decoder
Analyze SAE features through their decoder weights. This skill answers: "What does this feature RECOMMEND?" rather than "What activates this feature?"
Purpose
Decoder analysis provides a complementary perspective to activation analysis:
| Analysis Type | Question Answered |
|---|---|
| Activation (overview, sweeps) | "What inputs activate this feature?" |
| Decoder (this skill) | "What outputs does this feature promote?" |
For diffuse or heterogeneous features where activation analysis shows multiple modes, decoder analysis often reveals the unifying concept.
When to Use
Use this skill when:
- Activation analysis is inconclusive - Multiple modes or no clear pattern
- Feature appears heterogeneous - Different builds activate it for different reasons
- Looking for "what does it recommend" - Shift from inputs to outputs
- Checking AP level preferences - Does feature prefer low-AP (_3, _6) vs high-AP (_57)?
- Finding similar features - Cluster features by decoder similarity
Commands
Output Influence
Show what tokens a feature promotes (positive contribution) or suppresses (negative contribution):
cd /root/dev/SplatNLP # Basic output influence poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \ --feature-id 13934 \ --model ultra # JSON output poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \ --feature-id 13934 \ --model ultra \ --format json # More tokens poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \ --feature-id 13934 \ --model ultra \ --top-k 25
Sample Output:
## Feature 13934 Output Influence (ultra) ### Tokens This Feature PROMOTES | Token | Contribution | Family | AP Level | |-------|--------------|--------|----------| | respawn_punisher | +0.232 | respawn_punisher | binary | | comeback | +0.159 | comeback | binary | | quick_super_jump_6 | +0.155 | quick_super_jump | 6 | | intensify_action_3 | +0.140 | intensify_action | 3 | | ink_saver_main_6 | +0.128 | ink_saver_main | 6 | ### Tokens This Feature SUPPRESSES | Token | Contribution | Family | AP Level | |-------|--------------|--------|----------| | run_speed_up_57 | -0.301 | run_speed_up | 57 | | quick_respawn_57 | -0.247 | quick_respawn | 57 | | swim_speed_up_57 | -0.209 | swim_speed_up | 57 | ### Interpretation - **Top promoted**: respawn_punisher (+0.232) - **Top suppressed**: run_speed_up_57 (-0.301) - **Pattern**: Promotes low-AP tokens, suppresses high-AP stacking
Weight Percentile
Check how important a feature is by its decoder weight magnitude:
poetry run python -m splatnlp.mechinterp.cli.decoder_cli weight-percentile \ --feature-id 13934 \ --model ultra
Sample Output:
## Feature 13934 Decoder Weight (ultra) - **Magnitude**: 2.3456 - **Percentile**: 78.5% - **Total features**: 24576
Interpretation:
- High percentile (>90%): Feature has strong output influence
- Low percentile (<10%): Feature has weak output influence
- Note: Low-magnitude features may still be important for specific tokens
Similar Features (by Decoder)
Find features with similar decoder patterns (what they recommend):
poetry run python -m splatnlp.mechinterp.cli.decoder_cli similar \ --feature-id 13934 \ --model ultra \ --top-k 10
Sample Output:
## Features Similar to 13934 (ultra) | Feature ID | Cosine Similarity | |------------|-------------------| | 13892 | 0.9234 | | 14501 | 0.8876 | | 12044 | 0.8521 |
Experiment Runner
For programmatic use or integration with runner_cli:
# Create spec file cat > decoder_spec.json << 'EOF' { "type": "decoder_output_analysis", "feature_id": 13934, "model_type": "ultra", "variables": { "top_k_promoted": 15, "top_k_suppressed": 15, "group_by_family": true, "include_ap_level": true } } EOF # Run via runner CLI poetry run python -m splatnlp.mechinterp.cli.runner_cli \ --spec-path decoder_spec.json
Interpretation Guide
AP Level Patterns
| Pattern | Meaning |
|---|---|
| Promotes _3, _6; Suppresses _51, _57 | "Use balanced spread, not stacking" |
| Promotes _57; Suppresses low AP | "Heavy stacking is the goal" |
| Promotes binary (RP, CB, OG) | "These specific abilities are key" |
| Mixed AP levels promoted | "Ability presence matters, not amount" |
Common Feature Types
| Output Pattern | Feature Type |
|---|---|
| Single family promoted | Family detector (e.g., SCU detector) |
| Low-AP promoted, high-AP suppressed | "Balanced utility recommendation" |
| Binary abilities promoted | "Build style marker" (aggressive, defensive) |
| Death perks promoted (QR, SS, CB) | "Death-tolerant" archetype |
| Death perks suppressed | "Death-averse" archetype |
Integration with Investigation Workflow
Decoder analysis fits into the investigation workflow as follows:
1. Overview (mechinterp-overview) ↓ 2. Hypothesis formation ↓ 3. 1D Sweeps (mechinterp-runner) ↓ 4. Core Coverage Check ← NEW: Catch tail markers ↓ 5. If diffuse/heterogeneous: → Decoder Output Analysis ← THIS SKILL ↓ 6. Label formulation
Example: Feature 13934 (from investigation log)
Problem: Activation analysis showed two opposite modes (RP anchor vs Zombie builds).
Solution: Decoder analysis revealed unifying pattern:
PROMOTES: low-AP utility (_3, _6 tokens) SUPPRESSES: heavy stacking (_51, _57 tokens) → Feature recommends "balanced utility spread" regardless of death strategy
Key Insight: Different builds (RP vs Zombie) activate the feature because they share a NEED (balanced utility), not a BUILD pattern.
See Also
- mechinterp-overview: Initial feature assessment
- mechinterp-runner: Run experiments (including core_coverage_analysis, decoder_output_analysis)
- mechinterp-investigator: Full investigation workflow
- mechinterp-labeler: Save labels after investigation