Medical-research-skills flowio
Parse Flow Cytometry Standard (FCS) files v2.0–3.1 and extract events/metadata for preprocessing workflows (e.g., when you need NumPy arrays, channel info, or CSV/DataFrame export from cytometry files).
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/flowio" ~/.claude/skills/aipoch-medical-research-skills-flowio && rm -rf "$T"
manifest:
scientific-skills/Data Analysis/flowio/SKILL.mdsource content
When to Use
- You need to read FCS v2.0/3.0/3.1 files and extract event matrices for downstream preprocessing.
- You want to inspect or validate FCS metadata (TEXT segment) without loading event data (memory-efficient parsing).
- You need channel definitions (PnN/PnS), ranges (PnR), and automatic identification of scatter/fluorescence/time channels.
- You need to handle problematic FCS files with offset inconsistencies or multi-dataset content.
- You want to export cytometry events to CSV/Pandas DataFrame or write new/modified FCS files.
Key Features
- FCS parsing (v2.0–3.1): Reads HEADER/TEXT/DATA and optional ANALYSIS segments.
- Event extraction to NumPy: Returns event data as
with shapendarray
.(events, channels) - Optional preprocessing: Applies standard FCS transformations (gain/log/time scaling) when enabled.
- Metadata access: Exposes TEXT keywords and common instrument/acquisition fields.
- Channel utilities: Provides PnN/PnS labels, ranges, and indices for scatter/fluorescence/time channels.
- Robust parsing options: Flags for offset discrepancy handling and null-channel exclusion.
- Multi-dataset support: Detects and reads files containing multiple datasets.
- FCS writing: Create new FCS files from arrays and optionally preserve/override metadata.
Dependencies
python >= 3.9
(install via pip/uv; version depends on your environment)flowio- Example-only:
numpy >= 1.20pandas >= 1.5
Example Usage
""" End-to-end example: 1) Read an FCS file (metadata + events) 2) Convert to a Pandas DataFrame and export CSV 3) Filter events and write a new FCS file 4) Handle multi-dataset files """ from pathlib import Path import numpy as np import pandas as pd from flowio import ( FlowData, create_fcs, read_multiple_data_sets, MultipleDataSetsError, FCSParsingError, DataOffsetDiscrepancyError, ) FCS_PATH = "sample.fcs" def read_fcs_safely(path: str) -> FlowData: try: return FlowData(path) except DataOffsetDiscrepancyError: # Common workaround for files with inconsistent offsets return FlowData(path, ignore_offset_discrepancy=True) except FCSParsingError: # Looser mode if the file is malformed return FlowData(path, ignore_offset_error=True) def main() -> None: # --- 1) Read file (single dataset) --- try: flow = read_fcs_safely(FCS_PATH) except MultipleDataSetsError: # --- 4) Multi-dataset handling --- datasets = read_multiple_data_sets(FCS_PATH) flow = datasets[0] # pick the first dataset for this demo print("File:", getattr(flow, "name", Path(FCS_PATH).name)) print("FCS version:", flow.version) print("Events:", flow.event_count) print("Channels:", flow.channel_count) print("PnN labels:", flow.pnn_labels) # Metadata (TEXT segment) print("Instrument ($CYT):", flow.text.get("$CYT", "N/A")) print("Acquisition date ($DATE):", flow.text.get("$DATE", "N/A")) # --- 2) Events -> NumPy -> DataFrame -> CSV --- events = flow.as_array(preprocess=True) # default preprocessing behavior df = pd.DataFrame(events, columns=flow.pnn_labels) df.to_csv("events.csv", index=False) print("Wrote CSV:", "events.csv") # --- 3) Filter and write a new FCS --- # Example: threshold on first scatter channel if available, else channel 0 fsc_idx = flow.scatter_indices[0] if getattr(flow, "scatter_indices", []) else 0 threshold = np.percentile(events[:, fsc_idx], 50) # median threshold mask = events[:, fsc_idx] > threshold filtered = events[mask] create_fcs( "filtered.fcs", filtered, flow.pnn_labels, opt_channel_names=flow.pns_labels, metadata={**flow.text, "$SRC": "Filtered via FlowIO example"}, ) print("Wrote FCS:", "filtered.fcs") # --- Metadata-only read (memory efficient) --- meta_only = FlowData(FCS_PATH, only_text=True) print("Metadata-only read: $DATE =", meta_only.text.get("$DATE", "N/A")) if __name__ == "__main__": main()
Implementation Details
Data Model and Segments
An FCS file is organized into segments:
- HEADER: FCS version and byte offsets for other segments.
- TEXT: Keyword/value metadata (e.g.,
,$DATE
,$CYT
,$PnN
,$PnS
,$PnR
,$PnG
).$PnE - DATA: Event matrix encoded as integer/float/double/ASCII depending on file keywords.
- ANALYSIS (optional): Post-processing results if present.
In FlowIO, these are exposed via
FlowData attributes such as:
(HEADER info)flow.header
(TEXT keyword dictionary)flow.text
(ANALYSIS keyword dictionary, if present)flow.analysis
(decoded event matrix)flow.as_array(...)
Preprocessing (as_array(preprocess=True)
)
as_array(preprocess=True)When preprocessing is enabled, FlowIO applies common FCS transformations:
- Gain scaling (PnG): Values are multiplied by the per-parameter gain.
- Log/exponential transform (PnE): If present, applies:
wherevalue = a * 10^(b * raw_value)
.PnE = "a,b"
- Time scaling: If a time channel is detected, values may be scaled into appropriate units.
To disable all transformations and obtain raw decoded values:
flow.as_array(preprocess=False)
Channel Identification
FlowIO provides convenience indices for common channel types:
(e.g., FSC/SSC)flow.scatter_indices
(fluorescence channels)flow.fluoro_indices
(time channel index orflow.time_index
)None
These indices can be used to slice the event matrix:
events[:, flow.scatter_indices]events[:, flow.fluoro_indices]
Handling Problematic Files (Offsets and Null Channels)
Some files contain inconsistent offsets between HEADER and TEXT:
to tolerate HEADER/TEXT offset mismatch.ignore_offset_discrepancy=True
to prefer HEADER offsets.use_header_offsets=True
to bypass offset-related failures more aggressively.ignore_offset_error=True
To exclude known null/empty channels during parsing:
FlowData(path, null_channel_list=[...])
Multi-Dataset Files
If a file contains multiple datasets, constructing
FlowData(path) may raise MultipleDataSetsError. Use:
to load all datasets, orread_multiple_data_sets(path)
to load a specific dataset usingFlowData(path, nextdata_offset=...)
offsets.$NEXTDATA
Writing FCS
Two common patterns:
- Write metadata-only changes:
flow.write_fcs("out.fcs", metadata={...}) - Modify event data: extract array → modify →
to generate a new file (FlowIO does not modify event data in-place).create_fcs(...)