Claude-skill-registry basic-caching-evaluation

Evaluation of BaSiC illumination correction caching - NOT RECOMMENDED for sparse markers. Trigger: optimizing BaSiC, caching illumination correction

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/basic-caching-evaluation" ~/.claude/skills/majiayu000-claude-skill-registry-basic-caching-evaluation && rm -rf "$T"
manifest: skills/data/basic-caching-evaluation/SKILL.md
source content

BaSiC Caching Evaluation

Experiment Overview

ItemDetails
Date2025-12-11
GoalEvaluate whether caching BaSiC illumination profiles across cycles/channels improves processing speed
EnvironmentCuPy GPU acceleration, multiplex immunofluorescence data
StatusFAILED - Caching NOT recommended

Context

BaSiC (Background and Shading Correction) computes flatfield and darkfield correction profiles for each image stack. The hypothesis was that similar channels across cycles might share illumination profiles, allowing cached profiles to skip computation.

CRITICAL FINDING: DO NOT CACHE

Caching BaSiC profiles causes 15-20% intensity errors in sparse markers.

Why Caching Fails

  1. Sparse markers have unique illumination profiles: Channels with few positive cells (e.g., rare immune markers) have very different intensity distributions than dense markers (e.g., DAPI)

  2. Cross-channel variation: Even the same marker across cycles shows illumination variation due to:

    • Photobleaching differences
    • Mounting medium variations
    • Optical path changes between sessions
  3. Error compounds: Using wrong flatfield introduces systematic bias that propagates through all downstream analysis

Validation Experiment

# Test setup
channels_tested = ["DAPI", "CD3", "CD20", "CD68"]  # Dense to sparse

# Compute per-channel profiles
profiles_per_channel = {}
for ch in channels_tested:
    images = load_channel(ch)
    flatfield, darkfield = basic_correct(images)
    profiles_per_channel[ch] = (flatfield, darkfield)

# Test cross-application
for ch1 in channels_tested:
    for ch2 in channels_tested:
        if ch1 != ch2:
            # Apply ch1's profile to ch2's images
            corrected = apply_correction(
                load_channel(ch2),
                profiles_per_channel[ch1]
            )
            error = compute_error_vs_ground_truth(corrected, ch2)
            print(f"{ch1} -> {ch2}: {error:.1%} error")

Results

Source ProfileApplied ToError Rate
DAPICD38.2%
DAPICD2012.4%
DAPICD68 (sparse)18.7%
CD3CD68 (sparse)15.3%
Same channelSame channel0% (baseline)

Failed Attempts (Critical)

AttemptWhy it FailedLesson Learned
Cache by channel nameSame marker varies across cyclesEach acquisition is unique
Cache by intensity histogramSparse markers have distinct histogramsCan't match on statistics
Interpolate between profilesNon-linear relationshipNo simple interpolation works
Use DAPI as universal referenceDAPI is dense, others are sparseDensity matters for BaSiC

Recommended Optimizations (Instead of Caching)

Since caching doesn't work, focus on these GPU optimizations:

1. Optimize DCT Operations

# Use n-dimensional DCT instead of sequential 1D
from cupyx.scipy.fft import dctn, idctn

# Old (slower)
dct_result = dct(dct(image, axis=0), axis=1)

# New (faster)
dct_result = dctn(image, axes=(0, 1))

2. Batch Processing

# Process multiple z-planes in parallel
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(basic_correct, plane) for plane in z_planes]
    results = [f.result() for f in futures]

3. Pre-allocated GPU Buffers

# Avoid repeated allocations
class BaSiCGPU:
    def __init__(self, image_shape):
        self.buffer = cp.empty(image_shape, dtype=cp.float32)
        self.fft_buffer = cp.empty(image_shape, dtype=cp.complex64)

Key Insights

  • Every channel needs its own profile: No exceptions, no shortcuts
  • Sparse markers are most sensitive: Rare cell populations show largest errors
  • Speed gains elsewhere: Optimize GPU operations, not caching
  • Validation is essential: Always compare cached vs fresh correction

When Caching MIGHT Work (Limited Cases)

  1. Technical replicates: Same sample, same session, same channel
  2. Flatfield-only mode: If darkfield is disabled and imaging is very stable
  3. Coarse quality check: Quick preview, not final analysis

Even in these cases, validate carefully before using cached profiles.

References

  • BaSiC paper: Peng et al., Nature Communications 2017
  • Illumination correction review: Model-based vs data-driven approaches
  • KINTSUGI validation notebook:
    BaSiC_Caching_Validation_Test.ipynb