OpenClaw-Medical-Skills tooluniverse-image-analysis
Production-ready microscopy image analysis and quantitative imaging data skill for colony morphometry, cell counting, fluorescence quantification, and statistical analysis of imaging-derived measurements. Processes ImageJ/CellProfiler output (area, circularity, intensity, cell counts), performs Dunnett's test, Cohen's d effect size, power analysis, Shapiro-Wilk normality tests, two-way ANOVA, polynomial regression, natural spline regression with confidence intervals, and comparative morphometry. Supports CSV/TSV measurement tables, multi-channel fluorescence data, colony swarming assays, and neuron counting datasets. Use when analyzing microscopy measurement data, colony area/circularity, cell count statistics, swarming assays, co-culture ratio optimization, or answering questions about imaging-derived quantitative data.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-image-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-image-analysis && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-image-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-image-analysis && rm -rf "$T"
skills/tooluniverse-image-analysis/SKILL.md- pip install
Microscopy Image Analysis and Quantitative Imaging Data
Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.
IMPORTANT: This skill handles complex multi-workflow analysis. Most implementation details have been moved to
references/ for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.
When to Use This Skill
Apply when users:
- Have microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
- Ask about colony morphometry (bacterial swarming, biofilm, growth assays)
- Need statistical comparisons of imaging measurements (t-test, ANOVA, Dunnett's, Mann-Whitney)
- Ask about cell counting statistics (NeuN, DAPI, marker counts)
- Need effect size calculations (Cohen's d) and power analysis
- Want regression models (polynomial, spline) fitted to dose-response or ratio data
- Ask about model comparison (R-squared, F-statistic, AIC/BIC)
- Need Shapiro-Wilk normality testing on imaging data
- Want confidence intervals for peak predictions from fitted models
- Questions mention imaging software output (ImageJ, CellProfiler, QuPath)
- Need fluorescence intensity quantification or colocalization analysis
- Ask about image segmentation results (counts, areas, shapes)
BixBench Coverage: 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)
NOT for (use other skills instead):
- Phylogenetic analysis → Use
tooluniverse-phylogenetics - RNA-seq differential expression → Use
tooluniverse-rnaseq-deseq2 - Single-cell scRNA-seq → Use
tooluniverse-single-cell - Statistical regression only (no imaging context) → Use
tooluniverse-statistical-modeling
Core Principles
- Data-first approach - Load and inspect all CSV/TSV measurement data before analysis
- Question-driven - Parse the exact statistic, comparison, or model requested
- Statistical rigor - Proper effect sizes, multiple comparison corrections, model selection
- Imaging-aware - Understand ImageJ/CellProfiler measurement columns (Area, Circularity, Round, Intensity)
- Workflow flexibility - Support both pre-quantified data (CSV) and raw image processing
- Precision - Match expected answer format (integer, range, decimal places)
- Reproducible - Use standard Python/scipy equivalents to R functions
Required Python Packages
# Core (MUST be installed) import pandas as pd import numpy as np from scipy import stats from scipy.interpolate import BSpline, make_interp_spline import statsmodels.api as sm from statsmodels.formula.api import ols from statsmodels.stats.power import TTestIndPower from patsy import dmatrix, bs, cr # Optional (for raw image processing) import skimage import cv2 import tifffile
Installation:
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile
High-Level Workflow Decision Tree
START: User question about microscopy data │ ├─ Q1: What type of data is available? │ │ │ ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements) │ │ └─ Workflow: Load → Parse question → Statistical analysis │ │ Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54) │ │ See: Section "Quantitative Data Analysis" below │ │ │ └─ RAW IMAGES (TIFF, PNG, multi-channel) │ └─ Workflow: Load → Segment → Measure → Analyze │ See: references/image_processing.md │ ├─ Q2: What type of analysis is needed? │ │ │ ├─ STATISTICAL COMPARISON │ │ ├─ Two groups → t-test or Mann-Whitney │ │ ├─ Multiple groups → ANOVA or Dunnett's test │ │ ├─ Two factors → Two-way ANOVA │ │ └─ Effect size → Cohen's d, power analysis │ │ See: references/statistical_analysis.md │ │ │ ├─ REGRESSION MODELING │ │ ├─ Dose-response → Polynomial (quadratic, cubic) │ │ ├─ Ratio optimization → Natural spline │ │ └─ Model comparison → R-squared, F-statistic, AIC/BIC │ │ See: references/statistical_analysis.md │ │ │ ├─ CELL COUNTING │ │ ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed │ │ ├─ Brightfield → Adaptive threshold │ │ └─ High-density → CellPose or StarDist (external) │ │ See: references/cell_counting.md │ │ │ ├─ COLONY SEGMENTATION │ │ ├─ Swarming assays → Otsu threshold + morphology │ │ ├─ Biofilms → Li threshold + fill holes │ │ └─ Growth assays → Time-lapse tracking │ │ See: references/segmentation.md │ │ │ └─ FLUORESCENCE QUANTIFICATION │ ├─ Intensity measurement → regionprops │ ├─ Colocalization → Pearson/Manders │ └─ Multi-channel → Channel-wise quantification │ See: references/fluorescence_analysis.md │ └─ Q3: When to use scikit-image vs OpenCV? ├─ scikit-image: Scientific analysis, measurements, regionprops ├─ OpenCV: Fast processing, real-time, large batches └─ Both: Often interchangeable for basic operations See: references/image_processing.md "Library Selection Guide"
Quantitative Data Analysis Workflow
Phase 0: Question Parsing and Data Discovery
CRITICAL FIRST STEP: Before writing ANY code, identify what data files are available and what the question is asking for.
import os, glob, pandas as pd # Discover data files data_dir = "." csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True) tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True) img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True) # Load and inspect first measurement file if csv_files: df = pd.read_csv(csv_files[0]) print(f"Shape: {df.shape}") print(f"Columns: {list(df.columns)}") print(df.head()) print(df.describe())
Common Column Names:
- Area: Colony or cell area in pixels or calibrated units
- Circularity: 4piarea/perimeter^2, range [0,1], 1.0 = perfect circle
- Round: Roundness = 4area/(pimajor_axis^2)
- Genotype/Strain: Biological grouping variable
- Ratio: Co-culture mixing ratio (e.g., "1:3", "5:1")
- NeuN/DAPI/GFP: Cell marker counts or intensities
Phase 1: Grouped Statistics
def grouped_summary(df, group_cols, measure_col): """Calculate summary statistics by group.""" summary = df.groupby(group_cols)[measure_col].agg( Mean='mean', SD='std', Median='median', Min='min', Max='max', N='count' ).reset_index() summary['SEM'] = summary['SD'] / np.sqrt(summary['N']) return summary # Example: Colony morphometry by genotype area_summary = grouped_summary(df, 'Genotype', 'Area') circ_summary = grouped_summary(df, 'Genotype', 'Circularity')
For detailed statistical functions, see: references/statistical_analysis.md
Phase 2: Statistical Testing
Decision guide:
- Normality test needed? → Shapiro-Wilk
- Two groups comparison? → t-test or Mann-Whitney
- Multiple groups vs control? → Dunnett's test
- Multiple groups, all comparisons? → Tukey HSD
- Two factors? → Two-way ANOVA
- Effect size? → Cohen's d
- Sample size planning? → Power analysis
See: references/statistical_analysis.md for complete implementations
Phase 3: Regression Modeling
When to use each model:
- Polynomial (quadratic/cubic): Smooth dose-response, clear peak
- Natural spline: Flexible, non-parametric, handles complex patterns
- Linear: Simple relationships, checking for trends
Model comparison metrics:
- R-squared: Overall fit (higher = better)
- Adjusted R-squared: Penalizes complexity
- F-statistic p-value: Model significance
- AIC/BIC: Compare non-nested models
See: references/statistical_analysis.md for complete implementations
Raw Image Processing Workflow
When Processing Raw Images
Workflow: Load → Preprocess → Segment → Measure → Export
# Quick start for cell counting from scripts.segment_cells import count_cells_in_image result = count_cells_in_image( image_path="cells.tif", channel=0, # DAPI channel min_area=50 ) print(f"Found {result['count']} cells")
Segmentation Method Selection
Decision guide:
| Cell Type | Density | Best Method | Notes |
|---|---|---|---|
| Nuclei (DAPI) | Low-Medium | Otsu + watershed | Standard approach |
| Nuclei (DAPI) | High | CellPose/StarDist | Handles touching |
| Colonies | Well-separated | Otsu threshold | Fast, reliable |
| Colonies | Touching | Watershed | Edge detection |
| Cells (phase) | Any | Adaptive threshold | Handles uneven illumination |
| Fluorescence | Low signal | Li threshold | More sensitive |
See: references/segmentation.md and references/cell_counting.md for detailed protocols
Library Selection: scikit-image vs OpenCV
Use scikit-image when:
- Scientific measurements needed (area, perimeter, intensity)
- regionprops for object properties
- Publication-quality analysis
- Easier syntax for scientists
Use OpenCV when:
- Processing large image batches
- Speed is critical
- Real-time processing
- Advanced computer vision features
Both work for:
- Thresholding, filtering, morphological operations
- Basic image transformations
- Most segmentation tasks
See: references/image_processing.md "Library Selection Guide"
Common BixBench Patterns
Pattern 1: Colony Morphometry (bix-18)
Question type: "Mean circularity of genotype with largest area?"
Data: CSV with Genotype, Area, Circularity columns
Workflow:
- Load CSV → group by Genotype
- Calculate mean Area per genotype
- Identify genotype with max mean Area
- Report mean Circularity for that genotype
See: references/segmentation.md "Colony Morphometry Analysis"
Pattern 2: Cell Counting Statistics (bix-19)
Question type: "Cohen's d for NeuN counts between conditions?"
Data: CSV with Condition, NeuN_count, Sex, Hemisphere columns
Workflow:
- Load CSV → filter by hemisphere/sex if needed
- Split by Condition (KD vs CTRL)
- Calculate Cohen's d with pooled SD
- Report effect size
See: references/statistical_analysis.md "Effect Size Calculations"
Pattern 3: Multi-Group Comparison (bix-41)
Question type: "Dunnett's test: How many ratios equivalent to control?"
Data: CSV with multiple co-culture ratios, Area, Circularity
Workflow:
- Create Strain_Ratio labels
- Run Dunnett's test for Area (vs control)
- Run Dunnett's test for Circularity (vs control)
- Count groups NOT significant in BOTH tests
See: references/statistical_analysis.md "Dunnett's Test"
Pattern 4: Regression Optimization (bix-54)
Question type: "Peak frequency from natural spline model?"
Data: CSV with co-culture frequencies and Area measurements
Workflow:
- Convert ratio strings to frequencies
- Fit natural spline model (df=4)
- Find peak via grid search
- Report peak frequency + confidence interval
See: references/statistical_analysis.md "Regression Modeling"
Quick Reference Table
| Task | Primary Tool | Reference |
|---|---|---|
| Load measurement CSV | pandas.read_csv() | This file |
| Group statistics | df.groupby().agg() | This file |
| T-test | scipy.stats.ttest_ind() | statistical_analysis.md |
| ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md |
| Dunnett's test | scipy.stats.dunnett() | statistical_analysis.md |
| Cohen's d | Custom function (pooled SD) | statistical_analysis.md |
| Power analysis | statsmodels TTestIndPower | statistical_analysis.md |
| Polynomial regression | statsmodels.OLS + poly features | statistical_analysis.md |
| Natural spline | patsy.cr() + statsmodels.OLS | statistical_analysis.md |
| Cell segmentation | skimage.filters + watershed | cell_counting.md |
| Colony segmentation | skimage.filters.threshold_otsu | segmentation.md |
| Fluorescence quantification | skimage.measure.regionprops | fluorescence_analysis.md |
| Colocalization | Pearson/Manders | fluorescence_analysis.md |
| Image loading | tifffile, skimage.io | image_processing.md |
| Batch processing | scripts/batch_process.py | scripts/ |
Example Scripts
Ready-to-use scripts in
scripts/ directory:
- segment_cells.py - Cell/nuclei counting with watershed
- measure_fluorescence.py - Multi-channel intensity quantification
- batch_process.py - Process folders of images
- colony_morphometry.py - Measure colony area/circularity
- statistical_comparison.py - Group comparison statistics
Usage:
# Count cells in image python scripts/segment_cells.py cells.tif --channel 0 --min-area 50 # Batch process folder python scripts/batch_process.py input_folder/ output.csv --analysis cell_count
Detailed Reference Guides
For complete implementations and protocols:
- references/statistical_analysis.md - All statistical tests, regression models
- references/cell_counting.md - Cell/nuclei counting protocols
- references/segmentation.md - Colony and object segmentation
- references/fluorescence_analysis.md - Intensity quantification, colocalization
- references/image_processing.md - Image loading, preprocessing, library selection
- references/troubleshooting.md - Common issues and solutions
Important Notes
Matching R Statistical Functions
Some BixBench questions use R for analysis. Python equivalents:
- R's Dunnett test (
) →multcomp::glht
(scipy ≥ 1.10)scipy.stats.dunnett() - R's natural spline (
) →ns(x, df=4)
with explicit quantile knotspatsy.cr(x, knots=...) - R's t-test (
) →t.test()scipy.stats.ttest_ind() - R's ANOVA (
) →aov()
+statsmodels.formula.api.ols()sm.stats.anova_lm()
See: references/statistical_analysis.md for exact parameter matching
Answer Formatting
BixBench expects specific formats:
- "to the nearest thousand":
int(round(val, -3)) - Percentages: Usually integer or 1-2 decimal places
- Cohen's d: 3 decimal places
- Sample sizes: Always integer (ceiling)
- Ratios: String format "5:1"
Completeness Checklist
Before returning your answer, verify:
- Loaded all data files and inspected column names
- Identified the specific statistic or model requested
- Used correct grouping variables and filter conditions
- Applied correct rounding or format
- For "how many" questions: counted correctly based on criteria
- For statistical tests: used appropriate multiple comparison correction
- For regression: properly prepared and transformed data
- Double-checked direction of comparisons
- Verified answer falls within expected range
Getting Help
- Start with decision tree at top of this file
- Check relevant reference guide for detailed protocol
- Use example scripts as templates
- See troubleshooting guide for common issues
- All statistical implementations in statistical_analysis.md