SciAgent-Skills neuropixels-analysis
Pipeline for analyzing Neuropixels extracellular electrophysiology recordings. Covers probe geometry loading (ProbeInterface), spike sorting with Kilosort via SpikeInterface, quality metrics computation, unit curation (ISI violations, firing rate, signal-to-noise), and post-sort analysis (PSTH, tuning curves, population decoding) using pandas and matplotlib. Designed for acute and chronic Neuropixels 1.0/2.0/Ultra recordings from rodent and primate experiments.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/neuropixels-analysis" ~/.claude/skills/jaechang-hits-sciagent-skills-neuropixels-analysis && rm -rf "$T"
skills/scientific-computing/neuropixels-analysis/SKILL.mdNeuropixels Analysis
Overview
Neuropixels probes record extracellular voltage from 384 (NP 1.0) or 192 (NP 2.0) simultaneously recorded channels at 30 kHz. Analysis follows a canonical pipeline: raw data → spike sorting (Kilosort) → quality curation → unit analysis. SpikeInterface (Python) provides a unified API across 10+ spike sorters, handles data loading from multiple formats (SpikeGLX, OpenEphys, NWB), computes quality metrics, and exports sorted results. ProbeInterface manages probe geometry and channel maps. Post-sort analysis (PSTHs, firing rate, decoding) uses standard Python scientific stack.
When to Use
- Spike-sorting Neuropixels recordings from SpikeGLX (
) or OpenEphys (.bin
) to extract single-unit activity.dat - Applying automatic quality control metrics (ISI violations, SNR, firing rate) to curate sorted units
- Computing peristimulus time histograms (PSTHs) locked to experimental events
- Analyzing population coding: decoding stimulus or behavioral variables from firing rates
- Converting sorted data to NWB (Neurodata Without Borders) format for sharing
- Comparing multiple spike sorters on the same dataset for method validation
- Visualizing unit waveforms, auto-correlograms, and spatial distribution across probe channels
- Use SpikeInterface instead for a unified framework that supports 10+ spike sorters with a common API and comparison tools
Prerequisites
- Python packages:
,spikeinterface[full]
,probeinterface
,numpy
,pandas
,matplotlibscipy - Spike sorter: Kilosort4 (Python,
); or MATLAB-based Kilosort2/3 (requires MATLAB license)pip install kilosort - Data requirements: raw
(SpikeGLX) or.bin
(OpenEphys) recording; probe channel map file (.dat
or from ProbeInterface).prb - Hardware: NVIDIA GPU (4+ GB VRAM) required for Kilosort; CPU fallback available but ~10× slower
pip install spikeinterface[full] probeinterface kilosort # Optional: Phy for manual curation pip install phy
Workflow
Step 1: Load Raw Recording
import spikeinterface.full as si from pathlib import Path # SpikeGLX recording (most common Neuropixels format) data_dir = Path("/data/recording/session_001") recording = si.read_spikeglx(data_dir, stream_name="imec0.ap") print(f"Probe type: {recording.get_probe().name}") print(f"Channels: {recording.get_num_channels()}") print(f"Sampling rate: {recording.get_sampling_frequency()} Hz") print(f"Duration: {recording.get_total_duration():.1f} s") print(f"Total samples: {recording.get_total_samples()}")
Step 2: Apply Common Reference and Bandpass Filter
import spikeinterface.preprocessing as spre # Common median reference (removes common noise across all channels) recording_cmr = spre.common_reference(recording, reference="global", operator="median") # Bandpass filter: 300–6000 Hz for spikes recording_filt = spre.bandpass_filter(recording_cmr, freq_min=300, freq_max=6000) # Remove bad channels automatically recording_clean, removed_ids = spre.remove_bad_channels(recording_filt) print(f"Removed {len(removed_ids)} bad channels: {removed_ids}") print(f"Clean recording: {recording_clean.get_num_channels()} channels")
Step 3: Run Spike Sorting
import spikeinterface.sorters as ss from pathlib import Path output_dir = Path("./sorting_output") # Kilosort4 (recommended for Neuropixels; requires GPU) sorting = ss.run_sorter( "kilosort4", recording_clean, output_folder=output_dir / "kilosort4", remove_existing_folder=True, verbose=True, # Kilosort4 parameters nblocks=5, # number of drift correction blocks Th_learned=8, # threshold for learned templates do_correction=True, # drift correction ) print(f"Units found: {len(sorting.get_unit_ids())}") print(f"Spike counts (first 5): {[len(sorting.get_unit_spike_train(u, segment_index=0)) for u in sorting.unit_ids[:5]]}")
Step 4: Compute Waveforms and Quality Metrics
import spikeinterface.full as si import spikeinterface.qualitymetrics as sqm # Extract waveforms (snippets around each spike) waveforms = si.extract_waveforms( recording_clean, sorting, folder="./waveforms", ms_before=1.0, # ms before spike peak ms_after=2.0, # ms after spike peak max_spikes_per_unit=1000, overwrite=True, n_jobs=4, ) print(f"Waveform shape per unit: {waveforms.get_waveforms(waveforms.unit_ids[0]).shape}") # (n_spikes, n_samples, n_channels) # Compute quality metrics metrics = sqm.compute_quality_metrics( waveforms, metric_names=["snr", "isi_violation", "firing_rate", "presence_ratio", "amplitude_cutoff", "nearest_neighbor"], ) print(f"\nQuality metrics summary:") print(metrics.describe())
Step 5: Curate Units
import pandas as pd # Curation thresholds (Allen Brain Institute defaults) thresholds = { "snr": (5.0, None), # SNR ≥ 5 "isi_violations_ratio": (None, 0.1), # ISI violation ratio ≤ 10% "firing_rate": (0.1, None), # firing rate ≥ 0.1 Hz "presence_ratio": (0.9, None), # present ≥ 90% of recording "amplitude_cutoff": (None, 0.1), # amplitude cutoff ≤ 10% } def apply_thresholds(metrics_df, thresholds): mask = pd.Series(True, index=metrics_df.index) for metric, (low, high) in thresholds.items(): if metric not in metrics_df.columns: continue if low is not None: mask &= metrics_df[metric] >= low if high is not None: mask &= metrics_df[metric] <= high return mask good_units_mask = apply_thresholds(metrics, thresholds) good_unit_ids = metrics[good_units_mask].index.tolist() print(f"Total units: {len(metrics)}") print(f"Good units: {len(good_unit_ids)} ({100*len(good_unit_ids)/len(metrics):.0f}%)") # Filter sorting to good units only sorting_curated = sorting.select_units(good_unit_ids)
Step 6: Compute PSTHs and Visualize
import numpy as np import matplotlib.pyplot as plt def compute_psth(spike_times, event_times, window=(-0.5, 1.0), bin_size=0.01, fs=30000): """Compute peri-stimulus time histogram for one unit.""" bins = np.arange(window[0], window[1] + bin_size, bin_size) spike_times_s = spike_times / fs # samples → seconds counts = np.zeros(len(bins) - 1) for t_event in event_times: rel_times = spike_times_s - t_event in_window = rel_times[(rel_times >= window[0]) & (rel_times < window[1])] counts += np.histogram(in_window, bins=bins)[0] rate = counts / (len(event_times) * bin_size) # convert to Hz return bins[:-1] + bin_size / 2, rate # bin centers, firing rate # Example: visual stimulus events at 1.0, 2.5, 4.0, 5.5 s fs = 30000 event_times_s = np.array([1.0, 2.5, 4.0, 5.5, 7.0, 8.5]) # Plot PSTH for first 4 good units fig, axes = plt.subplots(2, 2, figsize=(10, 6)) for ax, unit_id in zip(axes.flat, good_unit_ids[:4]): spikes = sorting_curated.get_unit_spike_train(unit_id, segment_index=0) times, rate = compute_psth(spikes, event_times_s, fs=fs) ax.bar(times, rate, width=0.01, color="steelblue", alpha=0.8) ax.axvline(0, color="red", lw=1.5, linestyle="--", label="Stimulus") ax.set_xlabel("Time from stimulus (s)") ax.set_ylabel("Firing rate (Hz)") ax.set_title(f"Unit {unit_id}") plt.tight_layout() plt.savefig("psth_grid.pdf", bbox_inches="tight") print("Saved psth_grid.pdf")
Step 7: Export to NWB
import spikeinterface.exporters as sexp # Export sorted data to Neurodata Without Borders format nwb_path = "./recording_sorted.nwb" sexp.export_to_nwb( sorting_curated, nwb_file_path=nwb_path, overwrite=True, ) print(f"Exported to NWB: {nwb_path}") # Also export waveforms for Phy manual curation phy_dir = "./phy_export" sexp.export_to_phy( waveforms, output_folder=phy_dir, compute_pc_features=True, copy_binary=True, ) print(f"Phy export ready at: {phy_dir}") print("Launch with: phy template-gui phy_export/params.py")
Key Parameters
| Parameter | Module/Function | Default | Range / Options | Effect |
|---|---|---|---|---|
/ | | 300/6000 Hz | 150–500 / 3000–10000 | Spike band; NP 1.0 standard is 300–6000 Hz |
| Kilosort4 | 5 | 0–10 | Number of drift correction blocks; 0 disables drift correction |
| Kilosort4 | 8 | 6–12 | Detection threshold (× noise level); lower = more spikes, more noise |
/ | | 1.0/2.0 | 0.5–2.0/1.0–3.0 ms | Waveform snippet window around spike peak |
| | 500 | 100–5000 | Maximum spikes per unit for waveform extraction |
threshold | quality metrics | — | 5–10 | Signal-to-noise threshold for unit acceptance |
| quality metrics | — | 0.05–0.2 | Refractory period violation rate; <0.1 is well-isolated |
| quality metrics | — | 0.9 | Fraction of recording where unit is active |
| PSTH | 0.01 s | 0.001–0.05 s | PSTH temporal resolution; smaller = more detail, noisier |
Key Concepts
Spike Sorting Pipeline
Raw voltage → preprocessing (common reference, bandpass) → detection (threshold crossing or learned templates) → clustering (PCA + k-means or Gaussian mixture) → template matching → quality metrics → curation. Kilosort4 adds drift correction, which is critical for chronic NP recordings (tip drift of 5–50 µm over hours degrades unit isolation).
Quality Metrics
- SNR: peak waveform amplitude / noise level. SNR >5 indicates well-isolated unit
- ISI violation ratio: fraction of inter-spike intervals shorter than the refractory period (~1.5 ms). Ratio <0.1 indicates single-unit isolation
- Presence ratio: fraction of recording epochs where unit fired at least one spike. <0.9 suggests unit disappeared or was lost
- Amplitude cutoff: fraction of spikes missing due to detection threshold. <0.1 ensures near-complete detection
Common Recipes
Recipe: Compare Two Sorters on Same Recording
import spikeinterface.sorters as ss import spikeinterface.comparison as sc # Run two sorters sorting_ks4 = ss.run_sorter("kilosort4", recording_clean, output_folder="./ks4") sorting_sc = ss.run_sorter("spykingcircus2", recording_clean, output_folder="./sc2") # Compare outputs comparison = sc.compare_two_sorters(sorting_ks4, sorting_sc, sorting1_name="Kilosort4", sorting2_name="SpykingCircus2") print(comparison.get_performance()) # Shows: agreement fraction, recall, precision per matched unit pair
Recipe: Population Firing Rate Heatmap
import numpy as np import matplotlib.pyplot as plt fs = 30000 duration_s = sorting_curated.get_total_duration() bin_edges = np.arange(0, duration_s, 0.05) # 50 ms bins # Build firing rate matrix: (n_units, n_bins) fr_matrix = np.zeros((len(good_unit_ids), len(bin_edges)-1)) for i, uid in enumerate(good_unit_ids[:50]): spikes_s = sorting_curated.get_unit_spike_train(uid, segment_index=0) / fs counts, _ = np.histogram(spikes_s, bins=bin_edges) fr_matrix[i] = counts / 0.05 # Hz # Normalize each unit fr_norm = (fr_matrix - fr_matrix.mean(1, keepdims=True)) / (fr_matrix.std(1, keepdims=True) + 1e-6) fig, ax = plt.subplots(figsize=(12, 5)) im = ax.imshow(fr_norm, aspect="auto", cmap="RdBu_r", extent=[0, duration_s, len(good_unit_ids[:50]), 0], vmin=-2, vmax=2) ax.set_xlabel("Time (s)") ax.set_ylabel("Unit #") ax.set_title("Population Activity Heatmap (z-scored FR)") plt.colorbar(im, ax=ax, label="z-score") plt.tight_layout() plt.savefig("population_heatmap.pdf", bbox_inches="tight")
Expected Outputs
| Output | Format | Typical Content |
|---|---|---|
| SpikeInterface folder | Spike times per unit; template waveforms |
| SpikeInterface folder | Waveform snippets |
| CSV | SNR, ISI violations, firing rate, presence ratio per unit |
| PSTH plots for curated units | |
| NWB/HDF5 | Portable spike-sorted data for sharing |
| Phy format | + for manual curation GUI |
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Kilosort GPU memory error | Recording too long or too many channels for VRAM | Process recording in time chunks using ; use NP 2.0 (192 ch) settings |
| Zero units found | Threshold too high or preprocessing removed all signal | Lower to 6; verify has non-zero channel data |
| Excessive ISI violations (>50%) | Multi-unit activity merged as single unit | Manual curation with Phy; increase to be more conservative |
| Waveform extraction OOM | Too many spikes × channels × samples | Reduce ; increase for parallel extraction |
| Drift correction fails | Too few spikes or very short recording | Set (minimal correction) or (disable) |
| NWB export fails | Missing session metadata (subject, date) | Provide metadata; or use |
| SpikeGLX file not found | Wrong stream name | List streams: |
References
- SpikeInterface documentation — full API, tutorials, and sorter comparison guides
- Kilosort4 paper: Pachitariu et al. (2024), Nature Methods — drift correction and template learning
- SpikeInterface paper: Buccino et al. (2020), eLife — unified framework for extracellular electrophysiology
- ProbeInterface documentation — probe geometry and channel map handling
- Neuropixels documentation — hardware specs, imec reference designs
- NWB documentation — neurophysiology data standard for archiving and sharing