Medical-research-skills flowio

Parse Flow Cytometry Standard (FCS) files v2.0–3.1 and extract events/metadata for preprocessing workflows (e.g., when you need NumPy arrays, channel info, or CSV/DataFrame export from cytometry files).

install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/flowio" ~/.claude/skills/aipoch-medical-research-skills-flowio && rm -rf "$T"
manifest: scientific-skills/Data Analysis/flowio/SKILL.md
source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

  • You need to read FCS v2.0/3.0/3.1 files and extract event matrices for downstream preprocessing.
  • You want to inspect or validate FCS metadata (TEXT segment) without loading event data (memory-efficient parsing).
  • You need channel definitions (PnN/PnS), ranges (PnR), and automatic identification of scatter/fluorescence/time channels.
  • You need to handle problematic FCS files with offset inconsistencies or multi-dataset content.
  • You want to export cytometry events to CSV/Pandas DataFrame or write new/modified FCS files.

Key Features

  • FCS parsing (v2.0–3.1): Reads HEADER/TEXT/DATA and optional ANALYSIS segments.
  • Event extraction to NumPy: Returns event data as
    ndarray
    with shape
    (events, channels)
    .
  • Optional preprocessing: Applies standard FCS transformations (gain/log/time scaling) when enabled.
  • Metadata access: Exposes TEXT keywords and common instrument/acquisition fields.
  • Channel utilities: Provides PnN/PnS labels, ranges, and indices for scatter/fluorescence/time channels.
  • Robust parsing options: Flags for offset discrepancy handling and null-channel exclusion.
  • Multi-dataset support: Detects and reads files containing multiple datasets.
  • FCS writing: Create new FCS files from arrays and optionally preserve/override metadata.

Dependencies

  • python >= 3.9
  • flowio
    (install via pip/uv; version depends on your environment)
  • Example-only:
    • numpy >= 1.20
    • pandas >= 1.5

Example Usage

"""
End-to-end example:
1) Read an FCS file (metadata + events)
2) Convert to a Pandas DataFrame and export CSV
3) Filter events and write a new FCS file
4) Handle multi-dataset files
"""

from pathlib import Path

import numpy as np
import pandas as pd

from flowio import (
    FlowData,
    create_fcs,
    read_multiple_data_sets,
    MultipleDataSetsError,
    FCSParsingError,
    DataOffsetDiscrepancyError,
)

FCS_PATH = "sample.fcs"

def read_fcs_safely(path: str) -> FlowData:
    try:
        return FlowData(path)
    except DataOffsetDiscrepancyError:
        # Common workaround for files with inconsistent offsets
        return FlowData(path, ignore_offset_discrepancy=True)
    except FCSParsingError:
        # Looser mode if the file is malformed
        return FlowData(path, ignore_offset_error=True)

def main() -> None:
    # --- 1) Read file (single dataset) ---
    try:
        flow = read_fcs_safely(FCS_PATH)
    except MultipleDataSetsError:
        # --- 4) Multi-dataset handling ---
        datasets = read_multiple_data_sets(FCS_PATH)
        flow = datasets[0]  # pick the first dataset for this demo

    print("File:", getattr(flow, "name", Path(FCS_PATH).name))
    print("FCS version:", flow.version)
    print("Events:", flow.event_count)
    print("Channels:", flow.channel_count)
    print("PnN labels:", flow.pnn_labels)

    # Metadata (TEXT segment)
    print("Instrument ($CYT):", flow.text.get("$CYT", "N/A"))
    print("Acquisition date ($DATE):", flow.text.get("$DATE", "N/A"))

    # --- 2) Events -> NumPy -> DataFrame -> CSV ---
    events = flow.as_array(preprocess=True)  # default preprocessing behavior
    df = pd.DataFrame(events, columns=flow.pnn_labels)
    df.to_csv("events.csv", index=False)
    print("Wrote CSV:", "events.csv")

    # --- 3) Filter and write a new FCS ---
    # Example: threshold on first scatter channel if available, else channel 0
    fsc_idx = flow.scatter_indices[0] if getattr(flow, "scatter_indices", []) else 0
    threshold = np.percentile(events[:, fsc_idx], 50)  # median threshold
    mask = events[:, fsc_idx] > threshold
    filtered = events[mask]

    create_fcs(
        "filtered.fcs",
        filtered,
        flow.pnn_labels,
        opt_channel_names=flow.pns_labels,
        metadata={**flow.text, "$SRC": "Filtered via FlowIO example"},
    )
    print("Wrote FCS:", "filtered.fcs")

    # --- Metadata-only read (memory efficient) ---
    meta_only = FlowData(FCS_PATH, only_text=True)
    print("Metadata-only read: $DATE =", meta_only.text.get("$DATE", "N/A"))

if __name__ == "__main__":
    main()

Implementation Details

Data Model and Segments

An FCS file is organized into segments:

  • HEADER: FCS version and byte offsets for other segments.
  • TEXT: Keyword/value metadata (e.g.,
    $DATE
    ,
    $CYT
    ,
    $PnN
    ,
    $PnS
    ,
    $PnR
    ,
    $PnG
    ,
    $PnE
    ).
  • DATA: Event matrix encoded as integer/float/double/ASCII depending on file keywords.
  • ANALYSIS (optional): Post-processing results if present.

In FlowIO, these are exposed via

FlowData
attributes such as:

  • flow.header
    (HEADER info)
  • flow.text
    (TEXT keyword dictionary)
  • flow.analysis
    (ANALYSIS keyword dictionary, if present)
  • flow.as_array(...)
    (decoded event matrix)

Preprocessing (
as_array(preprocess=True)
)

When preprocessing is enabled, FlowIO applies common FCS transformations:

  1. Gain scaling (PnG): Values are multiplied by the per-parameter gain.
  2. Log/exponential transform (PnE): If present, applies:
    • value = a * 10^(b * raw_value)
      where
      PnE = "a,b"
      .
  3. Time scaling: If a time channel is detected, values may be scaled into appropriate units.

To disable all transformations and obtain raw decoded values:

  • flow.as_array(preprocess=False)

Channel Identification

FlowIO provides convenience indices for common channel types:

  • flow.scatter_indices
    (e.g., FSC/SSC)
  • flow.fluoro_indices
    (fluorescence channels)
  • flow.time_index
    (time channel index or
    None
    )

These indices can be used to slice the event matrix:

  • events[:, flow.scatter_indices]
  • events[:, flow.fluoro_indices]

Handling Problematic Files (Offsets and Null Channels)

Some files contain inconsistent offsets between HEADER and TEXT:

  • ignore_offset_discrepancy=True
    to tolerate HEADER/TEXT offset mismatch.
  • use_header_offsets=True
    to prefer HEADER offsets.
  • ignore_offset_error=True
    to bypass offset-related failures more aggressively.

To exclude known null/empty channels during parsing:

  • FlowData(path, null_channel_list=[...])

Multi-Dataset Files

If a file contains multiple datasets, constructing

FlowData(path)
may raise
MultipleDataSetsError
. Use:

  • read_multiple_data_sets(path)
    to load all datasets, or
  • FlowData(path, nextdata_offset=...)
    to load a specific dataset using
    $NEXTDATA
    offsets.

Writing FCS

Two common patterns:

  • Write metadata-only changes:
    flow.write_fcs("out.fcs", metadata={...})
  • Modify event data: extract array → modify →
    create_fcs(...)
    to generate a new file (FlowIO does not modify event data in-place).