NeuroTrace

neurotrace — Agent Skill

install
source · Clone the upstream repo
git clone https://github.com/cristianleoo/NeuroTrace
manifest: skill.md
safety · automated scan (medium risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • pip install
  • references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

neurotrace — Agent Skill

You are an AI agent that can run neurotrace, an activation-patching framework for encoder-only transformer models. Use this skill when a user wants to understand where factual knowledge is stored inside a HuggingFace MLM, or to compare how two models encode domain-specific facts.

When to use this skill

  • User asks "does [model] actually know [domain] facts?"
  • User wants to compare a fine-tuned model against its base model
  • User wants to locate knowledge neurons or test factual recall
  • User asks about activation patching, causal tracing, or knowledge localisation

Prerequisites

The repo must be installed:

cd /path/to/neurotrace
uv venv && uv pip install -e .
source .venv/bin/activate

Option A: Use the built-in Strands orchestrator agent

If you have a Strands-compatible LLM provider configured, you can delegate to the built-in orchestrator:

from neurotrace.providers import get_model
from neurotrace.agents import create_orchestrator

model = get_model("anthropic", client_args={"api_key": "sk-..."})
# Also available: "openai", "bedrock", "gemini"

agent = create_orchestrator(model=model)
agent("Compare SecureBERT vs ModernBERT on cybersecurity knowledge")

The orchestrator has tools for dataset generation, tracing, charting, and interpretation. It will drive the full pipeline autonomously.

Option B: Drive the pipeline yourself (step-by-step below)

Step-by-step protocol

Step 1: Gather information from the user

Ask the following questions (skip any the user has already answered):

  1. Domain: What knowledge domain? (e.g. cybersecurity, medicine, law, finance, programming)
  2. Goal: What do you want to find out? (e.g. "Does BioBERT store drug-disease associations differently from BERT?")
  3. Models: Which models to compare? Need display name + HuggingFace ID for each. At least one model, ideally two for comparison.
  4. Facts: What factual associations to test? For each fact, you need:
    • A sentence template with
      {}
      where the subject goes (e.g. "The attack that exploits {} injection")
    • The correct subject and an incorrect alternative
    • The target token the model should predict
    • A context word that anchors the fact, and a corruption for it
  5. Variants (optional): Custom prompt phrasing variants, or use defaults.

If the user provides a domain but no specific facts, generate 8–12 representative facts yourself covering diverse concepts in that domain. Ensure each fact has an unambiguous correct answer and a plausible corruption.

Step 2: Build the DatasetSpec

Create a JSON spec and save it:

from neurotrace.agent_dataset import DatasetSpec, FactSpec, ModelSpec, generate_from_spec

spec = DatasetSpec(
    domain="cybersecurity",
    description="Test whether SecureBERT stores cybersecurity facts differently from ModernBERT",
    models=[
        ModelSpec(name="ModernBERT", huggingface_id="answerdotai/ModernBERT-base"),
        ModelSpec(name="SecureBERT", huggingface_id="cisco-ai/SecureBERT2.0-base"),
    ],
    facts=[
        FactSpec(
            clean_subject="SQL",
            corrupt_subject="HTML",
            template="The attack that exploits input sanitization failures in a database is known as {} injection",
            target="SQL",
            context_from="database",
            context_to="email",
        ),
        # ... more facts ...
    ],
    output_dir="output",
)

dataset, config = generate_from_spec(spec, save_path="output/dataset.json")

Step 3: Run the trace

from neurotrace import CausalTracer
from neurotrace.tracer import save_results

results = []
results_dict = {}

for model_spec in spec.models:
    tracer = CausalTracer(model_spec.name, model_spec.huggingface_id)
    result = tracer.trace(dataset)
    results.append(result)
    results_dict[model_spec.name] = result.to_dict()
    tracer.release()

save_results(results, f"{spec.output_dir}/trace_results.json")

Step 4: Generate charts

from neurotrace import plot_comparison, plot_patching_flow, plot_heatmap, plot_peak_bars

out = spec.output_dir

# Line chart — mask-patch restoration per layer
plot_comparison(results_dict, curve="mask_patch",
    title="Mask-Patch: Probability Restoration by Layer",
    save_path=f"{out}/fig1_mask_patch.png")

# Line chart — context-patch restoration per layer
plot_comparison(results_dict, curve="context_patch",
    title="Context-Patch: Probability Restoration by Layer",
    save_path=f"{out}/fig2_context_patch.png")

# Patching flow diagram
peak = results[0].peak_layer()[0]
plot_patching_flow(num_layers=results[0].num_layers,
    highlighted_layer=peak, save_path=f"{out}/fig3_flow.png")

# Heatmap
plot_heatmap(results_dict, save_path=f"{out}/fig4_heatmap.png")

# Peak comparison bars
plot_peak_bars(results_dict, save_path=f"{out}/fig5_peaks.png")

Step 5: Interpret the results

Report to the user:

  1. Peak layer: Which layer showed the highest restoration score for mask-patching? This is where the model stores the domain knowledge.
  2. Peak magnitude: How strong was the signal? Larger = the model relies more on context to retrieve the fact.
  3. Context-patch vs mask-patch: If context-patch is flat (near zero), the model stores facts directly at the prediction position, not via context tokens.
  4. Clean-minus-corrupt baseline (
    clean_minus_corrupt
    in the results): If positive, clean context helps. If negative, the model's knowledge is robust enough that corruption doesn't hurt — a sign of strong factual entrenchment.
  5. Cross-model comparison: If two models peak at the same layer but with different magnitudes, the one with the smaller peak may actually have stronger knowledge (less to restore because corruption matters less).

DatasetSpec JSON schema

{
  "domain": "string (required)",
  "description": "string (required)",
  "models": [
    {
      "name": "string",
      "huggingface_id": "string"
    }
  ],
  "facts": [
    {
      "clean_subject": "string",
      "corrupt_subject": "string",
      "template": "string with {} placeholder",
      "target": "string",
      "context_from": "string",
      "context_to": "string"
    }
  ],
  "variants": [["prefix", "suffix"], ...],
  "output_dir": "string (default: 'output')"
}

Fact design guidelines

When generating facts for the user:

  • The template MUST contain
    {}
    where the subject goes
  • clean_subject
    should be the unambiguously correct answer
  • corrupt_subject
    should be a plausible but wrong alternative from the same category
  • target
    is the token the model predicts (usually same as
    clean_subject
    )
  • context_from
    is a word in the template that supports the correct answer
  • context_to
    replaces it with something that shifts meaning but keeps the sentence grammatical
  • Aim for 8–12 diverse facts per domain, covering different sub-topics
  • Each fact should be testable as a fill-in-the-blank: "...is known as [MASK]..."

Output files

After a run, the output directory will contain:

FileContents
dataset.json
The generated prompt pairs
run_config.json
Models, domain, metadata
trace_results.json
Per-layer restoration scores
fig1_mask_patch.png
Mask-patch line chart
fig2_context_patch.png
Context-patch line chart
fig3_flow.png
Patching flow diagram
fig4_heatmap.png
Layer × model heatmap
fig5_peaks.png
Peak layer bar chart

Error handling

  • If a model fails to load,
    CausalTracer.__init__
    will raise an exception. Catch it and report the HuggingFace ID to the user — they may need to
    pip install
    a specific tokenizer or use
    trust_remote_code=True
    .
  • If the
    [MASK]
    token is not found, the tracer falls back to finding the subject token position. This works but is less precise for MLMs.
  • Running on CPU is fine for base-size models (~120M params). For large models, suggest the user use a GPU.