SciAgent-Skills spikeinterface-electrophysiology

Unified Python framework for extracellular electrophysiology. Load recordings from 20+ formats (SpikeGLX, OpenEphys, NWB, Intan, Maxwell, Blackrock), preprocess signals, run 10+ spike sorters (Kilosort4, SpykingCircus2, Tridesclous, MountainSort5) with a single API, compute quality metrics (SNR, ISI violations, firing rate, amplitude cutoff), compare sorter outputs, and export to NWB or Phy. Use for format-agnostic and multi-sorter workflows. For a Neuropixels-specific Kilosort4 pipeline with PSTH and population decoding, use neuropixels-analysis instead.

install

source · Clone the upstream repo

git clone https://github.com/jaechang-hits/SciAgent-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/spikeinterface-electrophysiology" ~/.claude/skills/jaechang-hits-sciagent-skills-spikeinterface-electrophysiology && rm -rf "$T"

manifest: skills/scientific-computing/spikeinterface-electrophysiology/SKILL.md

source content

SpikeInterface — Unified Extracellular Electrophysiology Framework

Overview

SpikeInterface provides a common Python API to read extracellular recordings from 20+ file formats, preprocess raw voltage traces, run 10+ spike sorters, postprocess and quality-control sorted units, and export results — all without format-specific code. Its modular design lets users swap sorters, formats, and preprocessing steps without rewriting pipelines. SpikeInterface is built around lazy, chainable objects: a

Recording

holds raw data, a

Sorting

holds spike times, and a

SortingAnalyzer

ties them together for waveform and metric computation.

When to Use

Loading recordings from multiple acquisition systems (SpikeGLX, OpenEphys, Intan, NWB, Maxwell MEA, Blackrock) with a unified API rather than format-specific parsers
Running the same preprocessing and sorting pipeline across experiments recorded on different hardware
Comparing two or more spike sorters on the same recording to assess agreement and choose the best output
Running containerized sorters (Kilosort, IronClust, MountainSort5) via Docker or Singularity without local installation
Computing standard quality metrics (SNR, ISI violations, firing rate, presence ratio, amplitude cutoff) and applying threshold-based curation
Validating spike-sorting accuracy against synthetic or hybrid ground-truth recordings
Exporting sorted results to NWB for data sharing or to Phy for manual curation
Use
```
neuropixels-analysis
```
instead for a complete Neuropixels-specific Kilosort4 workflow including PSTH computation, tuning curves, and population decoding
For EEG, ECG, or other biosignal processing (not spike sorting), use
```
neurokit2
```
instead

Prerequisites

Python packages:

spikeinterface[full]>=0.101

probeinterface

numpy

matplotlib

Optional sorter deps:
```
kilosort
```
(pip), or Docker/Singularity for containerized sorters
Data requirements: raw binary recording files plus probe geometry (
```
.prb
```
,
```
.json
```
, or auto-detected from format)
Hardware: GPU required for Kilosort4; all other sorters run on CPU

pip install "spikeinterface[full]>=0.101" probeinterface
# Optional: Kilosort4 Python package
pip install kilosort
# Optional: Phy for manual curation
pip install phy

Quick Start

import spikeinterface.full as si
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.qualitymetrics as sqm

# Load, preprocess, sort, and inspect quality metrics in 10 lines
recording = si.read_openephys("/data/session_001", stream_name="Signals CH")
recording_pp = spre.bandpass_filter(
    spre.common_reference(recording, reference="global", operator="median"),
    freq_min=300, freq_max=6000,
)
sorting = ss.run_sorter("spykingcircus2", recording_pp, output_folder="./sc2_out")
analyzer = si.create_sorting_analyzer(sorting, recording_pp, folder="./analyzer")
analyzer.compute(["random_spikes", "waveforms", "templates", "noise_levels"])
metrics = sqm.compute_quality_metrics(analyzer, metric_names=["snr", "firing_rate", "isi_violation"])
print(metrics.describe())

Core API

Module 1: Recording I/O

SpikeInterface wraps every acquisition format behind a common

BaseRecording

interface. Once loaded, all objects expose the same methods regardless of origin format.

import spikeinterface.full as si

# SpikeGLX (.bin + .meta)
recording_sglx = si.read_spikeglx("/data/session_001", stream_name="imec0.ap")

# OpenEphys (binary or classic)
recording_oe = si.read_openephys("/data/oe_session", stream_name="Signals CH")

# NWB file
recording_nwb = si.read_nwb_recording("/data/recording.nwb",
                                       electrical_series_name="ElectricalSeries")

# Intan RHD/RHS
recording_intan = si.read_intan("/data/session.rhd", stream_name="RHn")

# Inspect any recording with the same API
print(f"Format:       {type(recording_sglx).__name__}")
print(f"Channels:     {recording_sglx.get_num_channels()}")
print(f"Sampling rate:{recording_sglx.get_sampling_frequency()} Hz")
print(f"Duration:     {recording_sglx.get_total_duration():.1f} s")
print(f"Probe:        {recording_sglx.get_probe().name}")

# List available streams before loading (useful when a file has multiple streams)
streams = si.get_neo_streams("spikeglx", "/data/session_001")
print("Available streams:", streams)
# e.g. ['imec0.ap', 'imec0.lf', 'nidq']

# Select a time slice (lazy, no data loaded until get_traces() is called)
recording_slice = recording_sglx.frame_slice(
    start_frame=0,
    end_frame=int(60 * recording_sglx.get_sampling_frequency()),  # first 60 s
)
print(f"Sliced duration: {recording_slice.get_total_duration():.1f} s")

Module 2: Preprocessing

Preprocessing functions return new

Recording

objects wrapping the original; the chain is applied lazily when data is read. This keeps memory usage low even for multi-hour recordings.

import spikeinterface.preprocessing as spre

# 1. Common median reference — removes shared noise across all channels
recording_cmr = spre.common_reference(recording_sglx,
                                       reference="global",
                                       operator="median")

# 2. Bandpass filter for action potentials (300–6000 Hz typical)
recording_filt = spre.bandpass_filter(recording_cmr,
                                       freq_min=300,
                                       freq_max=6000)

# 3. Remove bad channels automatically (coherence-based detection)
recording_clean, removed_ids = spre.remove_bad_channels(recording_filt,
                                                          method="coherence+psd")
print(f"Removed {len(removed_ids)} bad channels: {removed_ids}")
print(f"Clean channels: {recording_clean.get_num_channels()}")

# Whitening — decorrelates channels; recommended before template-matching sorters
recording_white = spre.whiten(recording_clean, mode="local")

# Phase shift correction for Neuropixels (samples acquired with small time offsets)
recording_shifted = spre.phase_shift(recording_clean)

# Inspect a short snippet of preprocessed data
traces = recording_white.get_traces(start_frame=0, end_frame=3000, segment_index=0)
print(f"Trace snippet shape: {traces.shape}")   # (3000, n_channels)
print(f"Trace range: [{traces.min():.2f}, {traces.max():.2f}] µV")

Module 3: Spike Sorting

ss.run_sorter()

wraps every supported sorter behind a uniform call signature. Sorter-specific parameters are passed as keyword arguments; all other pipeline steps are identical.

import spikeinterface.sorters as ss
from pathlib import Path

# List all sorters available in the current environment
available = ss.available_sorters()
print("Available sorters:", available)

# List sorters that can run without local installation (via container)
installed = ss.installed_sorters()
print("Installed locally:", installed)

# Run SpykingCircus2 (CPU, no external deps)
sorting_sc2 = ss.run_sorter(
    "spykingcircus2",
    recording_clean,
    output_folder=Path("./sorter_output/sc2"),
    remove_existing_folder=True,
    verbose=True,
)
print(f"SpykingCircus2 units: {len(sorting_sc2.get_unit_ids())}")

# Run Kilosort4 via Docker container (no local GPU/MATLAB required)
sorting_ks4 = ss.run_sorter(
    "kilosort4",
    recording_clean,
    output_folder=Path("./sorter_output/ks4"),
    singularity_image=False,   # use Docker; set True for Singularity
    docker_image=True,
    remove_existing_folder=True,
    # Kilosort4-specific parameters
    nblocks=5,
    Th_learned=8,
    do_correction=True,
)
print(f"Kilosort4 units: {len(sorting_ks4.get_unit_ids())}")

# Run MountainSort5 (CPU, fast, good for tetrode/low-channel-count probes)
sorting_ms5 = ss.run_sorter(
    "mountainsort5",
    recording_clean,
    output_folder=Path("./sorter_output/ms5"),
    scheme="2",          # scheme 2 is recommended for high-density probes
    detect_threshold=5.5,
)
print(f"MountainSort5 units: {len(sorting_ms5.get_unit_ids())}")

Module 4: Postprocessing (SortingAnalyzer)

SortingAnalyzer

is the central postprocessing object in SpikeInterface >= 0.101. It replaces the older

WaveformExtractor

and provides a unified interface for waveforms, templates, PCAs, and downstream metrics.

import spikeinterface.full as si
import spikeinterface.postprocessing as spost

# Create analyzer (saves to disk; use format="memory" for in-RAM only)
analyzer = si.create_sorting_analyzer(
    sorting_sc2,
    recording_clean,
    folder="./analyzer_sc2",
    format="binary_folder",
    overwrite=True,
    sparse=True,           # sparse=True: only nearby channels per unit
    ms_before=1.0,
    ms_after=2.0,
)

# Compute extensions in dependency order
analyzer.compute([
    "random_spikes",       # subsample spike indices for waveform extraction
    "waveforms",           # raw waveform snippets per unit
    "templates",           # mean/std template per unit
    "noise_levels",        # per-channel noise estimate
])

# Retrieve templates
templates = analyzer.get_extension("templates").get_data(outputs="Templates")
print(f"Templates object: {templates}")
print(f"Unit 0 template shape: {templates.get_one_template_dense(0).shape}")
# (n_samples, n_channels)

# Compute amplitude and PCA extensions (needed for quality metrics)
analyzer.compute([
    "spike_amplitudes",          # amplitude at peak channel per spike
    "principal_components",      # PCA scores (n_components x n_spikes)
    "template_similarity",       # pairwise template correlation matrix
    "correlograms",              # auto- and cross-correlograms
    "unit_locations",            # estimated unit position on probe (center of mass)
])

# Access spike amplitudes for first unit
ext_amp = analyzer.get_extension("spike_amplitudes")
unit_ids = analyzer.unit_ids
amps = ext_amp.get_data()[analyzer.sorting.ids_to_indices([unit_ids[0]])]
print(f"Unit {unit_ids[0]} — median amplitude: {abs(amps).median():.1f} µV")

Module 5: Quality Metrics

Quality metrics summarize unit isolation quality. Metrics requiring only spike times (ISI violations, firing rate) are fast; metrics requiring waveforms (SNR, amplitude cutoff) need the

SortingAnalyzer

to be populated first.

import spikeinterface.qualitymetrics as sqm

# Compute a standard panel of quality metrics
metrics = sqm.compute_quality_metrics(
    analyzer,
    metric_names=[
        "snr",                    # signal-to-noise ratio of template peak
        "isi_violation",          # fraction of ISIs < refractory period
        "firing_rate",            # mean firing rate (Hz) over recording
        "presence_ratio",         # fraction of time windows with ≥1 spike
        "amplitude_cutoff",       # estimated fraction of spikes below threshold
        "nearest_neighbor",       # isolation distance in PCA space
        "silhouette_score",       # cluster separation in PCA space
    ],
)
print(metrics.head())
print(f"\nShape: {metrics.shape}")  # (n_units, n_metrics)

import pandas as pd

# Apply threshold-based curation (Allen Brain Institute defaults)
thresholds = {
    "snr":                   (">=", 5.0),
    "isi_violations_ratio":  ("<=", 0.1),
    "firing_rate":           (">=", 0.1),
    "presence_ratio":        (">=", 0.9),
    "amplitude_cutoff":      ("<=", 0.1),
}

keep = pd.Series(True, index=metrics.index)
for col, (op, val) in thresholds.items():
    if col not in metrics.columns:
        continue
    if op == ">=":
        keep &= metrics[col] >= val
    else:
        keep &= metrics[col] <= val

good_unit_ids = metrics[keep].index.tolist()
print(f"Total units:   {len(metrics)}")
print(f"Curated units: {len(good_unit_ids)} ({100*len(good_unit_ids)/len(metrics):.0f}%)")

# Filter analyzer to good units
sorting_curated = sorting_sc2.select_units(good_unit_ids)

Module 6: Comparison and Export

Compare sorters against each other or against ground truth, then export results in shareable formats.

import spikeinterface.comparison as sc

# Compare two sorters — matches units by spike train overlap
comparison = sc.compare_two_sorters(
    sorting_sc2,
    sorting_ks4,
    sorting1_name="SpykingCircus2",
    sorting2_name="Kilosort4",
    match_score=0.5,        # minimum overlap to count as a match
    delta_time=0.4,         # coincidence window (ms)
)

# Performance summary per matched unit pair
perf = comparison.get_performance(method="by_unit")
print(perf.head(10))
# Columns: accuracy, recall, precision, false_discovery_rate, miss_rate

# Agreement score matrix (fraction overlap between all unit pairs)
agreement_matrix = comparison.get_agreement_fraction_table()
print(f"Agreement matrix shape: {agreement_matrix.shape}")

import spikeinterface.exporters as sexp

# Export curated sorting to NWB (Neurodata Without Borders)
sexp.export_to_nwb(
    sorting_curated,
    nwb_file_path="./session_sorted.nwb",
    overwrite=True,
)
print("Exported to NWB: session_sorted.nwb")

# Export to Phy for manual curation
sexp.export_to_phy(
    analyzer,
    output_folder="./phy_export",
    compute_pc_features=True,
    copy_binary=True,
    remove_if_exists=True,
)
print("Phy export ready at: ./phy_export")
print("Launch Phy with: phy template-gui phy_export/params.py")

Common Workflows

Workflow 1: Multi-Sorter Comparison on OpenEphys Data

Goal: Load an OpenEphys recording, preprocess, run two sorters, compare their agreement, curate the higher-yield output, and export to NWB.

import spikeinterface.full as si
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.comparison as sc
import spikeinterface.qualitymetrics as sqm
import spikeinterface.exporters as sexp
from pathlib import Path

# --- Step 1: Load ---
data_dir = Path("/data/oe_recording")
streams = si.get_neo_streams("openephys", data_dir)
print("Streams:", streams)

recording = si.read_openephys(data_dir, stream_name="Signals CH")
print(f"Loaded: {recording.get_num_channels()} ch, "
      f"{recording.get_sampling_frequency()} Hz, "
      f"{recording.get_total_duration():.1f} s")

# --- Step 2: Preprocess ---
rec = spre.bandpass_filter(recording, freq_min=300, freq_max=6000)
rec = spre.common_reference(rec, reference="global", operator="median")
rec, bad_ids = spre.remove_bad_channels(rec, method="coherence+psd")
print(f"Preprocessing complete. Removed channels: {bad_ids}")

# --- Step 3: Run two sorters ---
out = Path("./sorting_outputs")
sorting_sc2 = ss.run_sorter("spykingcircus2", rec,
                              output_folder=out / "sc2",
                              remove_existing_folder=True)
sorting_tdc = ss.run_sorter("tridesclous2", rec,
                              output_folder=out / "tdc",
                              remove_existing_folder=True)
print(f"SC2 units: {len(sorting_sc2.unit_ids)}, "
      f"TDC units: {len(sorting_tdc.unit_ids)}")

# --- Step 4: Compare ---
cmp = sc.compare_two_sorters(sorting_sc2, sorting_tdc,
                               sorting1_name="SC2",
                               sorting2_name="Tridesclous2",
                               match_score=0.5)
perf = cmp.get_performance(method="pooled_with_average")
print(f"\nAgreement performance:\n{perf}")

# --- Step 5: Quality metrics on SC2 (higher yield) ---
analyzer = si.create_sorting_analyzer(sorting_sc2, rec,
                                        folder="./analyzer_sc2",
                                        overwrite=True, sparse=True)
analyzer.compute(["random_spikes", "waveforms", "templates",
                  "noise_levels", "spike_amplitudes"])
metrics = sqm.compute_quality_metrics(
    analyzer,
    metric_names=["snr", "firing_rate", "isi_violation",
                  "presence_ratio", "amplitude_cutoff"],
)

keep = (metrics["snr"] >= 5) & (metrics["isi_violations_ratio"] <= 0.1) \
     & (metrics["firing_rate"] >= 0.1) & (metrics["presence_ratio"] >= 0.9)
sorting_curated = sorting_sc2.select_units(metrics[keep].index.tolist())
print(f"\nCurated: {len(sorting_curated.unit_ids)} / {len(sorting_sc2.unit_ids)} units")

# --- Step 6: Export winner to NWB ---
sexp.export_to_nwb(sorting_curated,
                    nwb_file_path="./session_sorted.nwb",
                    overwrite=True)
print("Saved: session_sorted.nwb")

Workflow 2: Ground Truth Validation with Synthetic Recordings

Goal: Generate a synthetic recording with known spike trains, run a sorter, and measure true accuracy (recall, precision) against ground truth — for benchmarking sorters or testing preprocessing pipelines.

import spikeinterface.full as si
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.comparison as sc
import numpy as np

# --- Step 1: Generate ground-truth synthetic recording ---
# Uses a Marsaglia noise model with realistic waveform templates
recording_gt, sorting_gt = si.generate_ground_truth_recording(
    durations=[120.0],             # 120 s recording
    sampling_frequency=30000.0,
    num_channels=32,
    num_units=10,
    noise_kwargs={"noise_level": 10.0, "dtype": "float32"},
    seed=42,
)
print(f"GT recording: {recording_gt.get_num_channels()} ch, "
      f"{recording_gt.get_total_duration():.0f} s")
print(f"GT units: {len(sorting_gt.unit_ids)}")
print(f"GT firing rates: "
      f"{[round(len(sorting_gt.get_unit_spike_train(u, 0))/120, 1) for u in sorting_gt.unit_ids]} Hz")

# --- Step 2: Preprocess ---
rec_pp = spre.bandpass_filter(recording_gt, freq_min=300, freq_max=6000)
rec_pp = spre.common_reference(rec_pp, reference="global", operator="median")

# --- Step 3: Sort with two sorters ---
sorting_sc2 = ss.run_sorter("spykingcircus2", rec_pp,
                              output_folder="./gt_sc2",
                              remove_existing_folder=True)
sorting_ms5 = ss.run_sorter("mountainsort5", rec_pp,
                              output_folder="./gt_ms5",
                              remove_existing_folder=True,
                              scheme="2")

# --- Step 4: Compare each sorter against ground truth ---
for name, sorting_test in [("SC2", sorting_sc2), ("MS5", sorting_ms5)]:
    cmp = sc.compare_sorter_to_ground_truth(sorting_gt, sorting_test,
                                              exhaustive_gt=True)
    perf = cmp.get_performance(method="pooled_with_average")
    print(f"\n{name} vs Ground Truth:")
    print(f"  Accuracy:  {perf['accuracy']:.3f}")
    print(f"  Recall:    {perf['recall']:.3f}")
    print(f"  Precision: {perf['precision']:.3f}")
    print(f"  Well-detected units: {cmp.get_well_detected_units(well_detected_score=0.8)}")

Workflow 3: Batch Processing Multiple Sessions

Goal: Apply the same preprocessing + sorting pipeline to multiple recording sessions and collect quality metrics across all sessions.

import spikeinterface.full as si
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.qualitymetrics as sqm
import pandas as pd
from pathlib import Path

sessions = list(Path("/data/experiment").glob("session_*/"))
all_metrics = []

for session_dir in sessions:
    print(f"Processing {session_dir.name} ...")
    try:
        streams = si.get_neo_streams("spikeglx", session_dir)
        ap_stream = [s for s in streams if "ap" in s][0]
        rec = si.read_spikeglx(session_dir, stream_name=ap_stream)

        # Preprocess
        rec = spre.bandpass_filter(
            spre.common_reference(rec, reference="global", operator="median"),
            freq_min=300, freq_max=6000,
        )
        rec, _ = spre.remove_bad_channels(rec)

        # Sort
        out_dir = session_dir / "sorting"
        sorting = ss.run_sorter("spykingcircus2", rec,
                                 output_folder=out_dir,
                                 remove_existing_folder=True)

        # Compute metrics
        analyzer = si.create_sorting_analyzer(
            sorting, rec, folder=session_dir / "analyzer", overwrite=True, sparse=True
        )
        analyzer.compute(["random_spikes", "waveforms", "templates",
                          "noise_levels", "spike_amplitudes"])
        m = sqm.compute_quality_metrics(
            analyzer, metric_names=["snr", "firing_rate", "isi_violation"]
        )
        m["session"] = session_dir.name
        all_metrics.append(m)

    except Exception as e:
        print(f"  FAILED: {e}")
        continue

# Combine across sessions
combined = pd.concat(all_metrics)
combined.to_csv("all_sessions_metrics.csv")
print(f"\nSaved metrics: {combined.shape[0]} units across {len(all_metrics)} sessions")
print(combined.groupby("session")[["snr", "firing_rate"]].median())

Key Parameters

Parameter	Module / Function	Default	Range / Options	Effect
`freq_min` / `freq_max`	`spre.bandpass_filter`	300 / 6000 Hz	150–500 / 3000–10000 Hz	Spike band; use 300–6000 Hz for AP activity
`reference`	`spre.common_reference`	`"global"`	`"global"` , `"local"` , `"single"`	Channel subset used for median reference subtraction
`method`	`spre.remove_bad_channels`	`"coherence+psd"`	`"coherence+psd"` , `"std"` , `"mad"`	Algorithm for bad channel detection
`scheme`	`ss.run_sorter("mountainsort5")`	`"2"`	`"1"` , `"2"` , `"3"`	Sorting scheme; scheme 2 recommended for high-density probes
`nblocks`	`ss.run_sorter("kilosort4")`	`5`	`0–10`	Number of drift correction blocks; 0 disables drift correction
`Th_learned`	`ss.run_sorter("kilosort4")`	`8`	`6–12`	Detection threshold (× noise); lower = more units, more noise
`match_score`	`sc.compare_two_sorters`	`0.5`	`0.1–0.9`	Minimum spike-train overlap to declare a unit match
`sparse`	`si.create_sorting_analyzer`	`True`	`True` , `False`	Limit waveform extraction to channels near each unit; reduces memory
`ms_before` / `ms_after`	`si.create_sorting_analyzer`	`1.0` / `2.0` ms	0.5–2.0 / 1.0–3.0 ms	Waveform snippet window relative to detected spike peak
`snr` threshold	`sqm.compute_quality_metrics`	—	5–10 recommended	Amplitude / noise ratio; > 5 indicates well-isolated unit
`isi_violations_ratio`	`sqm.compute_quality_metrics`	—	≤ 0.1 recommended	Fraction of ISIs < refractory period (1.5 ms); < 0.1 = single unit
`presence_ratio`	`sqm.compute_quality_metrics`	—	≥ 0.9 recommended	Fraction of recording epochs where unit fires; < 0.9 = drifting unit

Best Practices

Always inspect available streams before loading: Different acquisition systems save AP data, LFP data, and auxiliary channels as separate streams. Loading the wrong stream silently yields valid-looking but incorrect data.

streams = si.get_neo_streams("spikeglx", data_dir)
print(streams)  # e.g. ['imec0.ap', 'imec0.lf', 'nidq']
recording = si.read_spikeglx(data_dir, stream_name="imec0.ap")

Chain preprocessing lazily; do not load to memory early: Preprocessing objects are lazy and apply transformations at read time. Calling
```
get_traces()
```
on the raw recording before preprocessing will load unfiltered data into RAM unnecessarily. Build the full chain before any data access.
Use
```
sparse=True
```
when creating a SortingAnalyzer: For high-channel-count probes (64–384 channels), dense waveform extraction is 10–50× more expensive in RAM and disk than sparse. Sparse mode extracts waveforms only on the channels nearest each unit.

Run containerized sorters to avoid dependency conflicts: Kilosort2/3 (MATLAB), IronClust, and other sorters have complex dependencies. Use

docker_image=True

run_sorter()

to pull the official container and run the sorter in isolation:

sorting = ss.run_sorter("kilosort2_5", recording_clean,
                         output_folder="./ks25_out",
                         docker_image=True)

Compute metrics extensions in dependency order: Extensions depend on each other. The canonical order is:
```
random_spikes
```
→
```
waveforms
```
→
```
templates
```
→
```
noise_levels
```
→
```
spike_amplitudes
```
→
```
principal_components
```
. Skipping an earlier step causes a
```
MissingExtensionError
```
when a downstream step is requested.
Save the SortingAnalyzer to disk for large recordings: In-memory analyzers (
```
format="memory"
```
) are lost when the process exits. For recordings longer than 30 minutes or with many units, always specify a
```
folder
```
path so the analyzer can be reloaded:
```
analyzer = si.load_sorting_analyzer("./analyzer_sc2")
```
Do not compare sorters with mismatched preprocessing: When benchmarking sorters, run all of them on the same preprocessed
```
recording_clean
```
object. Running sorters on different preprocessing chains invalidates the comparison.

Common Recipes

Recipe: Load and Inspect a Multi-Stream Recording

When to use: Quickly check what streams are available in an unfamiliar recording and confirm channel counts and duration before committing to a full sort.

import spikeinterface.full as si

data_dir = "/data/recording_session"

# Try SpikeGLX first; if it fails, try OpenEphys
try:
    streams = si.get_neo_streams("spikeglx", data_dir)
    fmt = "spikeglx"
except Exception:
    streams = si.get_neo_streams("openephys", data_dir)
    fmt = "openephys"

print(f"Format: {fmt}")
print(f"Streams: {streams}")

for stream in streams:
    try:
        rec = si.read_spikeglx(data_dir, stream_name=stream) if fmt == "spikeglx" \
              else si.read_openephys(data_dir, stream_name=stream)
        print(f"  {stream}: {rec.get_num_channels()} ch, "
              f"{rec.get_sampling_frequency()} Hz, "
              f"{rec.get_total_duration():.1f} s")
    except Exception as e:
        print(f"  {stream}: could not load ({e})")

Recipe: Export Quality Metrics Report to CSV

When to use: After running quality metrics, save a tidy CSV summarizing all units with their metrics and a pass/fail column for downstream analysis or sharing with collaborators.

import spikeinterface.qualitymetrics as sqm
import pandas as pd

metrics = sqm.compute_quality_metrics(
    analyzer,
    metric_names=["snr", "firing_rate", "isi_violation",
                  "presence_ratio", "amplitude_cutoff"],
)

# Add pass/fail column based on standard thresholds
metrics["pass_qc"] = (
    (metrics["snr"] >= 5) &
    (metrics["isi_violations_ratio"] <= 0.1) &
    (metrics["firing_rate"] >= 0.1) &
    (metrics["presence_ratio"] >= 0.9) &
    (metrics["amplitude_cutoff"] <= 0.1)
)

metrics.to_csv("unit_quality_metrics.csv")
n_pass = metrics["pass_qc"].sum()
print(f"QC report saved: {len(metrics)} total units, {n_pass} pass ({100*n_pass/len(metrics):.0f}%)")
print(metrics[metrics["pass_qc"]].describe())

Recipe: Probe Geometry Visualization

When to use: Verify that the probe channel map loaded correctly before sorting. Incorrect channel maps silently degrade sorting quality on high-density probes.

import spikeinterface.full as si
import matplotlib.pyplot as plt
import probeinterface.plotting as pp

recording = si.read_spikeglx("/data/session_001", stream_name="imec0.ap")
probe = recording.get_probe()
print(f"Probe name: {probe.name}")
print(f"N contacts: {probe.get_contact_count()}")
print(f"Contact positions (first 5):\n{probe.contact_positions[:5]}")

fig, ax = plt.subplots(figsize=(3, 10))
pp.plot_probe(probe, ax=ax, with_channel_index=True)
ax.set_title(f"{probe.name} — channel map")
plt.tight_layout()
plt.savefig("probe_geometry.png", dpi=150)
print("Saved probe_geometry.png")

Troubleshooting

Problem	Cause	Solution
`ValueError: stream_name not found`	Recording has multiple streams; none is specified	Run `si.get_neo_streams(format, path)` to list available streams; pass the correct one to the reader
Sorter output has zero units	Detection threshold too high, or preprocessing removed all signal	Verify `recording_clean.get_traces()` returns non-zero data; lower detection threshold (e.g. `Th_learned=6` for Kilosort4)
`MissingExtensionError`	Analyzer extension depends on an uncomputed prerequisite	Follow the canonical compute order: `random_spikes` → `waveforms` → `templates` → `noise_levels` → `spike_amplitudes`
Docker sorter hangs at startup	Docker daemon not running or image not pulled	Run `docker ps` to confirm Docker is running; pull image manually with `docker pull spikeinterface/kilosort4-compiled-base`
`MemoryError` during waveform extraction	Dense extraction on high-channel-count probe	Use `sparse=True` in `create_sorting_analyzer` ; reduce `max_spikes_per_unit` (default 500)
Bad channel detection removes too many channels	Threshold too aggressive or short recording	Set `method="std"` for a simpler threshold; increase `bad_threshold` parameter
Unit comparison shows 0% agreement between sorters	Delta time window too narrow or match score too strict	Increase `delta_time` (default 0.4 ms) and lower `match_score` (try 0.3)
NWB export raises `TypeError` on unit properties	Sorting contains non-serializable properties from sorter	Remove problematic properties: `sorting.remove_unit_property("property_name")` before export
`read_spikeglx` fails on LF stream	LFP stream uses different file suffix ( `.lf.bin` )	Specify `stream_name="imec0.lf"` explicitly; confirm file exists with `ls data_dir/*.lf.bin`

Related Skills

neuropixels-analysis — Neuropixels-specific pipeline using SpikeInterface + Kilosort4 with PSTH, tuning curves, and population decoding for rodent and primate experiments
neurokit2 — For biosignal processing (ECG, EEG, EDA, EMG, PPG) rather than spike sorting; use when data is not extracellular electrophysiology

References

SpikeInterface documentation — full API reference, tutorials, and sorter-specific guides
SpikeInterface GitHub — source code, changelogs, and issue tracker
Buccino et al. (2020), eLife — SpikeInterface paper — unified framework design and benchmarks across sorters
ProbeInterface documentation — probe geometry handling and channel map formats
SpikeInterface sorter list — supported sorters, requirements, and container images
NWB documentation — Neurodata Without Borders format for neurophysiology data sharing