Claude-skill-registry consolidate-transcripts
Consolidate transcripts from a channel into a single file, sorted by date (newest first), up to 800K tokens. Use when preparing transcripts for LLM context or bulk analysis.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/consolidate-transcripts" ~/.claude/skills/majiayu000-claude-skill-registry-consolidate-transcripts && rm -rf "$T"
manifest:
skills/data/consolidate-transcripts/SKILL.mdsource content
Consolidate Transcripts
Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.
Quick Start
python scripts/consolidate_transcripts.py <channel_name>
Output:
~/Documents/YTScriber/<channel_name>/<channel_name>-consolidated.md
[!NOTE] This feature is currently a standalone script. A
CLI command is planned for a future release.ytscriber consolidate
Workflow
1. Identify the Channel
List available channels:
ls ~/Documents/YTScriber/
2. Choose Token Limit
| Use Case | Recommended Limit | Flag |
|---|---|---|
| Claude (200K context) | 150000 | |
| GPT-4 Turbo (128K) | 100000 | |
| Full archive (Claude Pro) | 800000 | (default) |
| Quick sample | 50000 | |
[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.
3. Run Consolidation
python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]
Examples:
# Default (800K tokens) python scripts/consolidate_transcripts.py library-of-minds # Custom limit for GPT-4 python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000 # Verbose output showing all included files python scripts/consolidate_transcripts.py dwarkesh-patel --verbose
4. Verify Output
Check the consolidated file was created:
ls -la ~/Documents/YTScriber/<channel_name>/*-consolidated.md
Parameters
| Option | Description | Default |
|---|---|---|
| Folder name in data directory | Required |
| Maximum tokens to include | 800000 |
| Show detailed file list | False |
Output Format
The consolidated file includes:
- Header — Generation metadata, total transcripts, token/word counts
- Table of Contents — Dates, titles, tokens, words per transcript
- Transcripts — Full text with title, date, author, source URL
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| tiktoken not installed | |
| Empty transcripts folder | Run first |
| Channel doesn't exist | Check for valid names |
| Output file is small | Few transcripts available | Use to see what was included |
| Token count seems wrong | Old tiktoken version | |
Common Mistakes
- Wrong channel name — Use the folder name exactly as shown in
, not the YouTube channel name.ls ~/Documents/YTScriber/ - Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run
first.ytscriber download - Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
- Expecting real-time updates — Re-run consolidation after downloading new transcripts.
Reference
- Transcripts sorted newest first (descending by date)
- Files without dates in filename are placed last
- Token counting uses
encoding (GPT-4/Claude compatible)cl100k_base - Consolidated files are gitignored (not committed)
- Re-running overwrites the previous consolidated file