Claude-skill-registry-data mechinterp-labeler
Manage feature labeling workflow - queue management, label storage, similar features, progress tracking
git clone https://github.com/majiayu000/claude-skill-registry-data
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mechinterp-labeler" ~/.claude/skills/majiayu000-claude-skill-registry-data-mechinterp-labeler && rm -rf "$T"
data/mechinterp-labeler/SKILL.mdMechInterp Labeler
Manage the feature labeling workflow. This skill provides tools for:
- Priority queue management
- Setting and syncing labels
- Finding similar features
- Tracking labeling progress
Purpose
The labeler skill enables interactive feature labeling sessions:
- Get the next feature to label from a priority queue
- Use overview and experiments to understand the feature
- Save labels with categories and notes
- Find similar features to label next
- Track overall progress
Commands
Get Next Feature
cd /root/dev/SplatNLP # Get next feature from queue poetry run python -m splatnlp.mechinterp.cli.labeler_cli next --model ultra # Don't auto-build queue if empty poetry run python -m splatnlp.mechinterp.cli.labeler_cli next --model ultra --no-build
Set a Label
IMPORTANT: Always use
--source to track label provenance.
Source Options:
— Label created through Claude Code CLI investigationclaude code
— Label created through Codex (OpenAI) agentcodex
— Label created through Codex orchestrating Claudecodex/claude
— Label created by human manuallymanual
— Label created through dashboard UI (default)dashboard
# Label from Claude Code investigation poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \ --feature-id 18712 \ --name "Special Charge Stacker" \ --model ultra \ --source "claude code" # With category and notes poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \ --feature-id 18712 \ --name "SCU Detector" \ --category tactical \ --notes "Responds to Special Charge Up presence, stronger at high AP" \ --source "claude code" # Manual labeling by human poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \ --feature-id 18712 \ --name "My Label" \ --source "manual"
Categories:
: Low-level patterns (token presence, combinations)mechanical
: Mid-level patterns (build strategies, weapon synergies)tactical
: High-level patterns (playstyle, meta concepts)strategic
: Uncategorizednone
Required Label Fields
Every label in
consolidated_ultra.json MUST include these fields:
| Field | Required | Description |
|---|---|---|
| ✓ | Integer feature ID |
| ✓ | "ultra" or "full" |
| ✓ | The label displayed in dashboard |
| ✓ | mechanical, tactical, strategic, or none |
| ✓ | Investigation notes with evidence |
| ✓ | Same as dashboard_name (for compatibility) |
| ✓ | ISO timestamp of last update |
| ✓ | Who created it (e.g., "claude code (full investigation)") |
| ✓ | 0.0-1.0 confidence score (DEPRECATED - use interpretability_confidence) |
| ✓ | Decoder weight percentile (0-100, objective measure of model importance) |
| ✓ | How confident we are in the interpretation (0.0-1.0, subjective) |
| Optional | Split-half stability if validation was run (0.0-1.0) |
| Optional | Alternative label for research context |
| Optional | Path to research state JSON |
Separating Importance from Interpretability
These three fields capture distinct dimensions:
| Field | Question Answered | Source |
|---|---|---|
| "Is this feature important to the model?" | Decoder weight magnitude (objective) |
| "Do we understand what this feature does?" | Investigation quality (subjective) |
| "Does this feature behave consistently?" | Split-half validation (objective) |
Common combinations:
| Importance | Interpretability | Meaning |
|---|---|---|
| High (>80) | High (>0.8) | Strong, well-understood feature |
| High (>80) | Low (<0.5) | Important but mysterious - needs more investigation |
| Low (<20) | High (>0.8) | Understood but weak - may be noise or redundant |
| Low (<20) | Low (<0.5) | Skip - not worth investigating |
Rule of thumb: Don't conflate these. A feature with 9th percentile importance but 0.85 interpretability confidence is "weak but understood" - useful for pattern recognition but not a major model component.
Example complete label:
{ "feature_id": 10938, "model_type": "ultra", "dashboard_name": "Positional Survival - Midrange", "dashboard_category": "strategic", "dashboard_notes": "Survival through positioning, not stealth/trading. Decoder promotes: SSU, BRU (all levels), ISS, IA, IRU. Suppresses: BPU, RSU, QR, SS. Weapons: Midrange with NO/BAD NS fit, LOW death tolerance. NS 0.84x depleted, QR 0.66x suppressed.", "display_name": "Positional Survival - Midrange", "last_updated": "2025-12-14T01:30:00.000000", "source": "claude code (full investigation)", "hypothesis_confidence": 0.85, "importance_percentile": 9.3, "interpretability_confidence": 0.85, "stability_score": null, "research_label": "Positional Survival - Midrange", "research_state_path": "/mnt/e/mechinterp_runs/state/feature_10938_ultra.json" }
⚠️ Super-Stimuli Warning
High activations may be "flanderized" versions of the true concept!
When labeling features, don't only examine extreme activations. High activation builds can be:
- Super-stimuli: Extreme, exaggerated versions of the core concept
- Weapon-gated: Only achievable on specific niche weapons
- Unrepresentative: Missing the general pattern that applies across weapons
How to Detect Super-Stimuli
-
Examine activation regions (as % of effective max = 99.5th percentile):
- Floor (≤1%), Low (1-10%), Below Core (10-25%)
- Core (25-75%), High (75-90%), Flanderization Zone (90%+)
- Use effective max to prevent outliers from distorting region boundaries
-
Look for weapons that span ALL levels continuously:
- If Splattershot appears in every region → feature encodes a general concept
- If only niche weapons reach 90%+ → those are "super-stimuli"
-
Compare core (25-75%) vs flanderization zone (90%+):
- Core region: diverse weapons, general builds = TRUE CONCEPT
- Flanderization zone: concentrated on 3-4 special-dependent weapons = SUPER-STIMULI
Example: Feature 9971
Initial label (wrong): "Death-Averse SCU Stacker" - Only looked at 90%+ activations (SCU_57 + special-dependent weapons) Better label: "Offensive Intensity (Death-Averse)" - Core region (25-75%) showed diverse weapons (Splattershot family, Sploosh, Hydra) - Feature tracks general offensive investment, not specifically SCU - Flanderization zone (90%+) with Bloblobber, Glooga are "super-stimuli" not the core concept
Key insight: The core region (25-75% of effective max) reveals the TRUE feature concept. High activations (90%+ of effective max) show what happens when that concept is pushed to flanderized extremes.
Core Coverage Validation (BEFORE LABELING)
Before finalizing any label, verify core coverage of the proposed signature.
A label based on a token/ability that only appears in <30% of core examples is labeling the TAIL, not the concept.
from splatnlp.mechinterp.skill_helpers import load_context import polars as pl import numpy as np ctx = load_context('ultra') df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID) # Define core region acts = df['activation'].to_numpy() nonzero_acts = acts[acts > 0] effective_max = np.percentile(nonzero_acts, 99.5) core_df = df.filter( (pl.col('activation') > 0.25 * effective_max) & (pl.col('activation') <= 0.75 * effective_max) ) # Check coverage of proposed label driver driver_id = ctx.vocab['YOUR_TOKEN_HERE'] # e.g., 'respawn_punisher' core_with_driver = core_df.filter( pl.col('ability_input_tokens').list.contains(driver_id) ) coverage = len(core_with_driver) / len(core_df) * 100 print(f"Core coverage: {coverage:.1f}%")
| Core Coverage | Label Guidance |
|---|---|
| >50% | Safe to headline this token/ability |
| 30-50% | Mention in notes, but not as headline |
| <30% | WRONG LABEL - this is a tail marker, not the concept |
Red flags that indicate wrong labeling:
- Binary ability with >5x tail enrichment but <20% core presence → tail marker
- Weapon with >40% in top-100 but <15% in core → flanderized
- Proposed signature covers <30% of core examples → incomplete interpretation
Example (Feature 13934):
Wrong approach: See RP with 8.57x enrichment → label as "RP Backline Anchor" Reality: RP only in 12% of core → RP is super-stimulus, not concept Right approach: Check core coverage FIRST → RP at 12% means it's a tail marker → Split by RP presence to find true concept → Label the commonality across modes
Label Quality Examples
Evolution from Mechanical to Strategic
| Investigation Stage | Label | Problem |
|---|---|---|
| After 1D sweeps | "SSU + ISM + IRU Kit" | Just lists tokens |
| After binary analysis | "Swim Efficiency Kit (Death-Averse)" | Mechanical + negation |
| After decoder grouping | "Swim Utility Sustain" | Better but still mechanical |
| After weapon role check | "Positional Survival - Midrange" | Strategic concept + role |
Good vs Bad Labels
| Bad Label | Why | Good Label | Why |
|---|---|---|---|
| "SCU Detector" | Token presence only | "Special Pressure Build" | Gameplay purpose |
| "Death-Averse Efficiency" | Negation + mechanical | "Positional Survival" | Positive concept |
| "High SSU Anchor" | Wrong role (Jr. isn't anchor) | "- Midrange" | Correct role |
| "Zombie + RP Mixed" | Describes modes, not concept | "Utility Axis (Multi-Modal)" | Names the pattern |
| "ISM Build" | Single token | "Ink Sustain - Backline" | Concept + role |
The Strategic Label Test
Before saving a label, ask:
-
"Would a competitive Splatoon player recognize this playstyle?"
- If no → too mechanical or wrong terminology
-
"Does this explain WHY the model learned this pattern?"
- If no → you're describing correlation, not causation
-
"Could I explain this to someone who doesn't know the tokens?"
- If no → label is too technical
Mandatory Label Components
Every strategic/tactical label should have:
- Core concept - The gameplay behavior (e.g., "Positional Survival")
- Role qualifier - Where/how it's played (e.g., "- Midrange")
- Notes with evidence - Decoder groups, weapon classification, key enrichments
Label Specificity by Category
Match label specificity to concept level:
| Category | Specificity | Example |
|---|---|---|
| mechanical | Terse, technical | "SCU Threshold 29+", "ISM Stacker" |
| tactical | Mid-level, names the combo | "Zombie Slayer Dualies", "Beacon Support Kit" |
| strategic | High-concept, captures the "why" | "Positional Survival - Midrange" |
- Mechanical = low-level pattern → precise, token-focused
- Tactical = build strategy → names the combo + weapon/class
- Strategic = gameplay philosophy → high-concept + role qualifier
Skip a Feature
# Skip the next feature poetry run python -m splatnlp.mechinterp.cli.labeler_cli skip --model ultra # Skip specific feature with reason poetry run python -m splatnlp.mechinterp.cli.labeler_cli skip \ --feature-id 18712 \ --reason "ReLU floor too high, hard to interpret"
Add Features to Queue
# Add single feature poetry run python -m splatnlp.mechinterp.cli.labeler_cli add 18712 --model ultra # Add multiple with priority poetry run python -m splatnlp.mechinterp.cli.labeler_cli add 18712,18890,19042 \ --priority 0.8 \ --reason "SCU-related cluster"
Find Similar Features
poetry run python -m splatnlp.mechinterp.cli.labeler_cli similar \ --feature-id 18712 \ --top-k 5 \ --model ultra
Check Status
poetry run python -m splatnlp.mechinterp.cli.labeler_cli status --model ultra
Output example:
## Labeling Status (ultra) ### Labels - Total labeled: 45 - From dashboard: 30 - From research: 10 - Merged: 5 ### Categories - tactical: 20 - mechanical: 15 - strategic: 5 - uncategorized: 5 ### Queue - Pending: 25 - Completed: 40 - Skipped: 5
Sync Labels
Pull labels from all sources (dashboard, research states):
poetry run python -m splatnlp.mechinterp.cli.labeler_cli sync --model ultra
Export Labels
poetry run python -m splatnlp.mechinterp.cli.labeler_cli export \ --model ultra \ --output /mnt/e/mechinterp_runs/labels/export.csv
Build Priority Queue
# By activation count (features with most data) poetry run python -m splatnlp.mechinterp.cli.labeler_cli build-queue \ --model ultra \ --method activation_count \ --top-k 50 # From cluster (similar to a seed feature) poetry run python -m splatnlp.mechinterp.cli.labeler_cli build-queue \ --model ultra \ --method cluster \ --seed 18712 \ --top-k 10
Typical Labeling Session
User: Let's label some features Claude: [runs: labeler_cli next --model ultra] Next feature: 18712 (priority: 0.85) [runs: overview_cli --feature-id 18712] ## Feature 18712 Overview - Top token: special_charge_up (27%) - Family: SCU 31% ... Based on the overview, this feature appears to detect Special Charge Up stacking. Want me to run a sweep? User: Yes, confirm with an SCU sweep Claude: [runs: runner_cli with family_1d_sweep] Results confirm monotonic increase with SCU. Proposed label: "SCU Detector - High AP" User: Call it "Special Charge Stacker" Claude: [runs: labeler_cli label --feature-id 18712 --name "Special Charge Stacker" --category tactical --source "claude code"] Label saved (source: claude code). Finding similar features... [runs: labeler_cli similar --feature-id 18712] Similar features: - 19042 (sim=0.82) - 18890 (sim=0.75) Want to add these to the queue?
Label Storage
Labels are stored in three places (kept in sync):
- Dashboard:
src/splatnlp/dashboard/feature_labels_{model}.json - Research State:
/mnt/e/mechinterp_runs/state/{model}/f{id}.json - Consolidated:
/mnt/e/mechinterp_runs/labels/consolidated_{model}.json
The consolidator merges all sources and resolves conflicts.
Queue Storage
Queue state is persisted at:
/mnt/e/mechinterp_runs/labels/queue_{model}.json
Contains:
- Pending entries with priorities
- Completed feature IDs
- Skipped feature IDs
Programmatic Usage
from splatnlp.mechinterp.labeling import ( LabelConsolidator, LabelingQueue, QueueBuilder, SimilarFinder, ) # Queue management queue = LabelingQueue.load("ultra") entry = queue.get_next() queue.mark_complete(entry.feature_id, "My Label") # Set labels consolidator = LabelConsolidator("ultra") consolidator.set_label( feature_id=18712, name="SCU Detector", category="tactical", notes="Responds to SCU presence", ) # Find similar finder = SimilarFinder("ultra") similar = finder.find_by_top_tokens(18712, top_k=5) # Build queue builder = QueueBuilder("ultra") queue = builder.build_by_activation_count(top_k=50)
See Also
- mechinterp-overview: Quick feature overview before labeling
- mechinterp-runner: Run experiments to validate hypotheses
- mechinterp-state: Track detailed research progress
- mechinterp-summarizer: Generate notes from experiments