Claude-skill-registry channel-name-parsing
Multi-format channel name parsing for KINTSUGI CHANNELNAMES.txt files
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/channel-name-parsing" ~/.claude/skills/majiayu000-claude-skill-registry-channel-name-parsing && rm -rf "$T"
manifest:
skills/data/channel-name-parsing/SKILL.mdsource content
Channel Name Parsing - Research Notes
Experiment Overview
| Item | Details |
|---|---|
| Date | 2024-12-15 |
| Goal | Parse channel names from various CHANNELNAMES.txt formats |
| Environment | KINTSUGI pipeline, Python 3.10+ |
| Status | Success |
Context
Different microscopy systems and users produce CHANNELNAMES.txt files in various formats. KINTSUGI needs to parse channel/marker names to label output files correctly. The parsing must auto-detect the format and handle multiple conventions.
Supported Formats
Format 1: Simple List (One Channel Per Line)
Most common format from CODEX systems. Each line is a channel name, 4 channels per cycle. Cycle number extracted from DAPI marker name (DAPI-01, DAPI-02, etc.).
DAPI-01 Blank Blank Blank DAPI-02 CD31 CD8 CD45 DAPI-03 CD20 Ki67 CD3e
Format 2: Cycle-Prefixed with Colon
1: DAPI, Blank, Blank, Blank 2: DAPI, CD31, CD8, CD45 3: DAPI, CD20, Ki67, CD3e
Format 3: Tab-Separated
1 DAPI Blank Blank Blank 2 DAPI CD31 CD8 CD45 3 DAPI CD20 Ki67 CD3e
Format 4: CSV (Comma-Separated)
1,DAPI,Blank,Blank,Blank 2,DAPI,CD31,CD8,CD45 3,DAPI,CD20,Ki67,CD3e
Verified Workflow
Complete Parsing Function
import re from pathlib import Path def load_channel_names(meta_dir, filename="CHANNELNAMES.txt", channels_per_cycle=4): """ Load channel names from various formats. Returns: dict {cycle_number: [channel_names]} or None """ channel_file = Path(meta_dir) / filename # Try alternative filenames if not channel_file.exists(): alt_names = ["CHANNELNAMES.txt", "channelnames.txt", "channel_names.txt", "channel_names.csv", "channels.txt", "markers.txt"] for alt_name in alt_names: alt_file = Path(meta_dir) / alt_name if alt_file.exists(): channel_file = alt_file break else: return None # Read non-empty, non-comment lines lines = [] with open(channel_file, 'r') as f: for line in f: line = line.strip() if line and not line.startswith('#'): lines.append(line) if not lines: return None channel_dict = {} first_line = lines[0] # Detect format from first line if ':' in first_line or '\t' in first_line or \ (first_line.split(',')[0].strip().isdigit() and len(first_line.split(',')) > 2): # Cycle-prefixed format for line in lines: try: if ':' in line: cycle_str, names_str = line.split(':', 1) cycle = int(cycle_str.strip()) names = [n.strip() for n in names_str.split(',')] elif '\t' in line: parts = line.split('\t') cycle = int(parts[0].strip()) names = [n.strip() for n in parts[1:]] else: parts = line.split(',') cycle = int(parts[0].strip()) names = [n.strip() for n in parts[1:]] channel_dict[cycle] = names except (ValueError, IndexError): continue else: # Simple list format - detect cycles from DAPI-XX pattern current_cycle = 0 cycle_channels = [] for line in lines: dapi_match = re.match(r'DAPI[-_]?(\d+)', line, re.IGNORECASE) if dapi_match: # Save previous cycle if cycle_channels and current_cycle > 0: channel_dict[current_cycle] = cycle_channels # Start new cycle current_cycle = int(dapi_match.group(1)) cycle_channels = [line] elif current_cycle > 0: cycle_channels.append(line) if len(cycle_channels) == channels_per_cycle: channel_dict[current_cycle] = cycle_channels cycle_channels = [] # Save final cycle if cycle_channels and current_cycle > 0: channel_dict[current_cycle] = cycle_channels return channel_dict
Usage
meta_dir = project.paths.meta # or Path("/path/to/meta") channel_name_dict = load_channel_names(meta_dir) if channel_name_dict is None: # Fallback to manual definition channel_name_dict = { 1: ["DAPI", "Blank1a", "Blank1b", "Blank1c"], 2: ["DAPI", "CD31", "CD8", "CD45"], 3: ["DAPI", "CD20", "Ki67", "CD3e"], } # Access channel name for cycle 2, channel 3 marker = channel_name_dict.get(2, [''] * 4)[2] # "CD8"
Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Only supporting cycle-prefixed format | Simple list format common in CODEX systems | Must auto-detect format from first line |
| Hardcoding 4 channels per cycle | Some systems have different channel counts | Make channels_per_cycle a parameter |
| Requiring exact "DAPI" match | Some files use "DAPI-01", "DAPI-02" with cycle number | Use regex to extract cycle from DAPI marker |
| Case-sensitive matching | "dapi-01" and "DAPI-01" both valid | Use re.IGNORECASE flag |
Final Parameters
Format Detection Heuristic
# Check first line for format indicators first_line = lines[0] is_cycle_prefixed = ( ':' in first_line or # "1: DAPI, Blank..." '\t' in first_line or # "1\tDAPI\tBlank..." (first_line.split(',')[0].strip().isdigit() and len(first_line.split(',')) > 2) # "1,DAPI,Blank..." )
DAPI Cycle Extraction Regex
dapi_match = re.match(r'DAPI[-_]?(\d+)', line, re.IGNORECASE) # Matches: DAPI-01, DAPI_01, DAPI01, dapi-1, etc.
Key Insights
- Auto-detect format rather than requiring user specification
- Simple list format uses DAPI marker to determine cycle boundaries
- Always provide fallback when file not found or parsing fails
- Support multiple filename conventions (CHANNELNAMES.txt, channelnames.txt, etc.)
- Comments (lines starting with #) should be ignored
- Empty lines should be skipped
References
- CODEX channel naming conventions
- KINTSUGI Notebook 2 cell-7 (Processing Parameters)