Claude-skill-registry-data mechinterp-crossmodel-matcher
Match SAE features between Ultra (24K) and Full (2K) models based on activation patterns and token overlap
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mechinterp-crossmodel-matcher" ~/.claude/skills/majiayu000-claude-skill-registry-data-mechinterp-crossmodel-matcher && rm -rf "$T"
manifest:
data/mechinterp-crossmodel-matcher/SKILL.mdsource content
MechInterp Cross-Model Matcher
Match features between the Ultra (24K features) and Full (2K features) SAE models to understand feature correspondence and discover monosemantic representations.
Purpose
The cross-model matcher skill:
- Finds corresponding features across models
- Computes similarity based on top token overlap
- Identifies features unique to each model
- Helps validate interpretations across model scales
When to Use
Use this skill when you:
- Have interpreted a feature in one model and want to find its counterpart
- Want to validate that a pattern exists across model scales
- Need to understand what the Ultra model decomposes that Full doesn't
Usage
Programmatic
from splatnlp.mechinterp.analysis import FeatureMatcher from splatnlp.mechinterp.skill_helpers import load_context # Load source context (the model with your known feature) source_ctx = load_context("ultra") # Initialize matcher (automatically loads target model) matcher = FeatureMatcher(source_ctx) # Find matches for an Ultra feature in the Full model report = matcher.find_matches( source_feature=18712, n_candidates=500, # How many Full features to check n_top_matches=10 # How many matches to return ) # View results print(f"Searched {report.n_candidates_tested} candidates") print(f"Best correlation: {report.best_correlation:.3f}") for match in report.matches: print(f"\nFull feature {match.target_feature}:") print(f" Token overlap: {match.top_token_overlap:.3f}") print(f" Shared tokens: {match.shared_top_tokens[:5]}") print(f" Notes: {match.notes}")
Detailed Comparison
# Compare two specific features in detail comparison = matcher.compare_features( source_fid=18712, # Ultra feature target_fid=1024, # Full feature ) print(f"Jaccard similarity: {comparison['jaccard_similarity']:.3f}") print(f"Shared tokens: {comparison['shared_tokens'][:10]}") print(f"Ultra-only tokens: {comparison['source_only_tokens'][:10]}") print(f"Full-only tokens: {comparison['target_only_tokens'][:10]}")
Matching Metrics
Token Overlap (Jaccard Similarity)
Compares top tokens between features:
overlap = |source_top ∩ target_top| / |source_top ∪ target_top|
- > 0.3: Strong match - likely same underlying concept
- 0.1 - 0.3: Moderate match - related but not identical
- < 0.1: Weak match - probably different concepts
Interpretation
High overlap suggests:
- Features detect the same pattern
- Ultra feature may be a "refinement" of Full feature
- Good candidate for cross-model validation
Low overlap with similar activation patterns suggests:
- Ultra model has decomposed the Full feature
- Multiple Ultra features may combine to match one Full feature
Example: Finding Ultra Decomposition
# Example: A Full model feature that might be polysemantic full_ctx = load_context("full") matcher = FeatureMatcher(full_ctx) # Source = Full # Find what Ultra features correspond to Full feature 512 report = matcher.find_matches(source_feature=512) # If multiple Ultra features match, the Full feature may be polysemantic if len([m for m in report.matches if m.combined_score > 0.1]) > 3: print("Full feature 512 appears to be polysemantic") print("Ultra decomposition:") for m in report.matches[:5]: print(f" Ultra {m.target_feature}: {m.shared_top_tokens[:3]}")
Workflow Integration
- Start with interpreted feature: Begin with a feature you understand
- Find matches: Use this skill to find counterparts
- Validate interpretation: Check if matches have similar behavior
- Document correspondence: Update research state with cross-model links
- Investigate decomposition: If Ultra splits a Full feature, analyze each part
Limitations
- Token overlap is a proxy; true matching would require shared activation data
- Different expansion factors mean different granularity
- Some features may not have clear counterparts
See Also
- mechinterp-cluster-mapper: Analyze groups of related features
- mechinterp-state: Track cross-model research
- mechinterp-runner: Validate matches with experiments