Medical-research-skills flowio

Parse Flow Cytometry Standard (FCS) files v2.0–3.1 and extract events/metadata for preprocessing workflows (e.g., when you need NumPy arrays, channel info, or CSV/DataFrame export from cytometry files).

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/flowio" ~/.claude/skills/aipoch-medical-research-skills-flowio && rm -rf "$T"

manifest: scientific-skills/Data Analysis/flowio/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

You need to read FCS v2.0/3.0/3.1 files and extract event matrices for downstream preprocessing.
You want to inspect or validate FCS metadata (TEXT segment) without loading event data (memory-efficient parsing).
You need channel definitions (PnN/PnS), ranges (PnR), and automatic identification of scatter/fluorescence/time channels.
You need to handle problematic FCS files with offset inconsistencies or multi-dataset content.
You want to export cytometry events to CSV/Pandas DataFrame or write new/modified FCS files.

Key Features

FCS parsing (v2.0–3.1): Reads HEADER/TEXT/DATA and optional ANALYSIS segments.
Event extraction to NumPy: Returns event data as
```
ndarray
```
with shape
```
(events, channels)
```
.
Optional preprocessing: Applies standard FCS transformations (gain/log/time scaling) when enabled.
Metadata access: Exposes TEXT keywords and common instrument/acquisition fields.
Channel utilities: Provides PnN/PnS labels, ranges, and indices for scatter/fluorescence/time channels.
Robust parsing options: Flags for offset discrepancy handling and null-channel exclusion.
Multi-dataset support: Detects and reads files containing multiple datasets.
FCS writing: Create new FCS files from arrays and optionally preserve/override metadata.

Dependencies

```
python >= 3.9
```
```
flowio
```
(install via pip/uv; version depends on your environment)
Example-only:
- ```
numpy >= 1.20
```
- ```
pandas >= 1.5
```

Example Usage

"""
End-to-end example:
1) Read an FCS file (metadata + events)
2) Convert to a Pandas DataFrame and export CSV
3) Filter events and write a new FCS file
4) Handle multi-dataset files
"""

from pathlib import Path

import numpy as np
import pandas as pd

from flowio import (
    FlowData,
    create_fcs,
    read_multiple_data_sets,
    MultipleDataSetsError,
    FCSParsingError,
    DataOffsetDiscrepancyError,
)

FCS_PATH = "sample.fcs"

def read_fcs_safely(path: str) -> FlowData:
    try:
        return FlowData(path)
    except DataOffsetDiscrepancyError:
        # Common workaround for files with inconsistent offsets
        return FlowData(path, ignore_offset_discrepancy=True)
    except FCSParsingError:
        # Looser mode if the file is malformed
        return FlowData(path, ignore_offset_error=True)

def main() -> None:
    # --- 1) Read file (single dataset) ---
    try:
        flow = read_fcs_safely(FCS_PATH)
    except MultipleDataSetsError:
        # --- 4) Multi-dataset handling ---
        datasets = read_multiple_data_sets(FCS_PATH)
        flow = datasets[0]  # pick the first dataset for this demo

    print("File:", getattr(flow, "name", Path(FCS_PATH).name))
    print("FCS version:", flow.version)
    print("Events:", flow.event_count)
    print("Channels:", flow.channel_count)
    print("PnN labels:", flow.pnn_labels)

    # Metadata (TEXT segment)
    print("Instrument ($CYT):", flow.text.get("$CYT", "N/A"))
    print("Acquisition date ($DATE):", flow.text.get("$DATE", "N/A"))

    # --- 2) Events -> NumPy -> DataFrame -> CSV ---
    events = flow.as_array(preprocess=True)  # default preprocessing behavior
    df = pd.DataFrame(events, columns=flow.pnn_labels)
    df.to_csv("events.csv", index=False)
    print("Wrote CSV:", "events.csv")

    # --- 3) Filter and write a new FCS ---
    # Example: threshold on first scatter channel if available, else channel 0
    fsc_idx = flow.scatter_indices[0] if getattr(flow, "scatter_indices", []) else 0
    threshold = np.percentile(events[:, fsc_idx], 50)  # median threshold
    mask = events[:, fsc_idx] > threshold
    filtered = events[mask]

    create_fcs(
        "filtered.fcs",
        filtered,
        flow.pnn_labels,
        opt_channel_names=flow.pns_labels,
        metadata={**flow.text, "$SRC": "Filtered via FlowIO example"},
    )
    print("Wrote FCS:", "filtered.fcs")

    # --- Metadata-only read (memory efficient) ---
    meta_only = FlowData(FCS_PATH, only_text=True)
    print("Metadata-only read: $DATE =", meta_only.text.get("$DATE", "N/A"))

if __name__ == "__main__":
    main()

Implementation Details

Data Model and Segments

An FCS file is organized into segments:

HEADER: FCS version and byte offsets for other segments.
TEXT: Keyword/value metadata (e.g.,
```
$DATE
```
,
```
$CYT
```
,
```
$PnN
```
,
```
$PnS
```
,
```
$PnR
```
,
```
$PnG
```
,
```
$PnE
```
).
DATA: Event matrix encoded as integer/float/double/ASCII depending on file keywords.
ANALYSIS (optional): Post-processing results if present.

In FlowIO, these are exposed via

FlowData

attributes such as:

```
flow.header
```
(HEADER info)
```
flow.text
```
(TEXT keyword dictionary)
```
flow.analysis
```
(ANALYSIS keyword dictionary, if present)
```
flow.as_array(...)
```
(decoded event matrix)

Preprocessing (

as_array(preprocess=True)

)

When preprocessing is enabled, FlowIO applies common FCS transformations:

Gain scaling (PnG): Values are multiplied by the per-parameter gain.
Log/exponential transform (PnE): If present, applies:
- ```
value = a * 10^(b * raw_value)
```
  where
```
PnE = "a,b"
```
  .
Time scaling: If a time channel is detected, values may be scaled into appropriate units.

To disable all transformations and obtain raw decoded values:

```
flow.as_array(preprocess=False)
```

Channel Identification

FlowIO provides convenience indices for common channel types:

```
flow.scatter_indices
```
(e.g., FSC/SSC)
```
flow.fluoro_indices
```
(fluorescence channels)
```
flow.time_index
```
(time channel index or
```
None
```
)

These indices can be used to slice the event matrix:

```
events[:, flow.scatter_indices]
```
```
events[:, flow.fluoro_indices]
```

Handling Problematic Files (Offsets and Null Channels)

Some files contain inconsistent offsets between HEADER and TEXT:

```
ignore_offset_discrepancy=True
```
to tolerate HEADER/TEXT offset mismatch.
```
use_header_offsets=True
```
to prefer HEADER offsets.
```
ignore_offset_error=True
```
to bypass offset-related failures more aggressively.

To exclude known null/empty channels during parsing:

```
FlowData(path, null_channel_list=[...])
```

Multi-Dataset Files

If a file contains multiple datasets, constructing

FlowData(path)

may raise

MultipleDataSetsError

. Use:

```
read_multiple_data_sets(path)
```
to load all datasets, or

FlowData(path, nextdata_offset=...)

to load a specific dataset using

$NEXTDATA

offsets.

Writing FCS

Two common patterns:

Write metadata-only changes:

flow.write_fcs("out.fcs", metadata={...})

Modify event data: extract array → modify →
```
create_fcs(...)
```
to generate a new file (FlowIO does not modify event data in-place).