Medical-research-skills histolab
Lightweight Whole Slide Image (WSI) tiling and preprocessing for digital pathology; use when you need fast tissue detection and tile extraction to prepare datasets or run quick tile-based analysis.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/histolab" ~/.claude/skills/aipoch-medical-research-skills-histolab && rm -rf "$T"
manifest:
scientific-skills/Data Analysis/histolab/SKILL.mdsource content
When to Use
- You need to tile gigapixel Whole Slide Images (WSI) into manageable patches for downstream analysis.
- You want automatic tissue detection to avoid background-heavy tiles during dataset preparation.
- You are building a simple, fast preprocessing pipeline (e.g., thresholding + morphology) for H&E slides.
- You need random sampling, grid coverage, or score-based selection of tiles for exploratory analysis or model training.
- You want to preview masks and tile locations before running extraction to validate parameters quickly.
Note: For advanced multiplex/spatial workflows or complex deep learning pipelines, consider more specialized frameworks (e.g., PathML).
Key Features
- Slide management via
: load WSI formats (SVS, TIFF, NDPI, etc.), access metadata, thumbnails, pyramid levels, and regions.Slide - Tissue masking via
,TissueMask
, and customBiggestTissueBoxMask
implementations.BinaryMask - Tile extraction strategies:
(random sampling with reproducible seeds)RandomTiler
(systematic coverage with optional overlap)GridTiler
(top-N tiles by a scoring function such as nuclei density)ScoreTiler
- Preprocessing filters (image + morphology) and filter pipelines via
.Compose - Visualization helpers to preview masks (
) and tile locations (locate_mask
) and to review extracted tiles.locate_tiles
Reference docs (optional, for deeper detail):
references/slide_management.mdreferences/tissue_masks.mdreferences/tile_extraction.mdreferences/filters_preprocessing.mdreferences/visualization.md
Dependencies
(install via pip/uv; version depends on your environment)histolab
Installation:
uv pip install histolab
Example Usage
A complete, runnable example that loads a slide, builds a tissue mask, previews tile locations, and extracts tiles.
from pathlib import Path from histolab.slide import Slide from histolab.masks import TissueMask from histolab.tiler import RandomTiler def main(): slide_path = "slide.svs" # replace with your WSI path out_dir = Path("output") out_dir.mkdir(parents=True, exist_ok=True) # 1) Load slide slide = Slide(slide_path, processed_path=str(out_dir)) # 2) Build/preview tissue mask tissue_mask = TissueMask() slide.locate_mask(tissue_mask) # writes a visualization into processed_path # 3) Configure tiler tiler = RandomTiler( tile_size=(512, 512), n_tiles=100, level=0, seed=42, check_tissue=True, tissue_percent=80.0, ) # 4) Preview tile locations (recommended before extraction) tiler.locate_tiles(slide, n_tiles=20) # 5) Extract tiles (restricted to tissue mask) tiler.extract(slide, extraction_mask=tissue_mask) print("Done. Check the output directory for thumbnails, mask previews, and tiles.") if __name__ == "__main__": main()
Implementation Details
Slide and pyramid levels
- WSIs are typically stored as pyramids (multiple resolutions).
usually means highest resolution (slowest, largest output).level=0- Using
orlevel=1
can significantly improve throughput for dataset prototyping.level=2
Tissue masking
generally segments all tissue regions (useful when multiple fragments exist).TissueMask
focuses on the largest tissue region (often faster and cleaner for single-section slides).BiggestTissueBoxMask- Masks can be customized by passing a filter pipeline (see
andreferences/tissue_masks.md
).references/filters_preprocessing.md
Tile extraction strategies and key parameters
- Common parameters
: output tile dimensions in pixels at the chosen pyramid level.tile_size=(w, h)
: enables tissue-content filtering.check_tissue=True
: minimum tissue coverage required for a tile to be accepted (commonly 70–90).tissue_percent
: restricts candidate tile locations to a mask-defined ROI.extraction_mask
- RandomTiler
: number of tiles to sample.n_tiles
: ensures reproducibility across runs.seed
- GridTiler
: overlap between adjacent tiles (0 for non-overlapping grids).pixel_overlap
- ScoreTiler
: ranks tiles (e.g.,scorer
) and selects top tiles.NucleiScorer- Often used with a CSV report (
) for auditability.report_path
Filters and preprocessing pipelines
- Filters are typically chained using
to build repeatable pipelines (e.g., grayscale → threshold → morphology).Compose - These pipelines can be used to:
- improve tissue segmentation robustness,
- remove small artifacts/holes,
- tailor detection to staining variability.
For concrete filter recipes and parameter guidance, see
references/filters_preprocessing.md.