Claude-skill-registry consolidate-transcripts

Consolidate transcripts from a channel into a single file, sorted by date (newest first), up to 800K tokens. Use when preparing transcripts for LLM context or bulk analysis.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/consolidate-transcripts" ~/.claude/skills/majiayu000-claude-skill-registry-consolidate-transcripts && rm -rf "$T"
manifest: skills/data/consolidate-transcripts/SKILL.md
source content

Consolidate Transcripts

Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.

Quick Start

python scripts/consolidate_transcripts.py <channel_name>

Output:

~/Documents/YTScriber/<channel_name>/<channel_name>-consolidated.md

[!NOTE] This feature is currently a standalone script. A

ytscriber consolidate
CLI command is planned for a future release.


Workflow

1. Identify the Channel

List available channels:

ls ~/Documents/YTScriber/

2. Choose Token Limit

Use CaseRecommended LimitFlag
Claude (200K context)150000
--limit 150000
GPT-4 Turbo (128K)100000
--limit 100000
Full archive (Claude Pro)800000(default)
Quick sample50000
--limit 50000

[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.

3. Run Consolidation

python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]

Examples:

# Default (800K tokens)
python scripts/consolidate_transcripts.py library-of-minds

# Custom limit for GPT-4
python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000

# Verbose output showing all included files
python scripts/consolidate_transcripts.py dwarkesh-patel --verbose

4. Verify Output

Check the consolidated file was created:

ls -la ~/Documents/YTScriber/<channel_name>/*-consolidated.md

Parameters

OptionDescriptionDefault
channel_name
Folder name in data directoryRequired
--limit, -l
Maximum tokens to include800000
--verbose, -v
Show detailed file listFalse

Output Format

The consolidated file includes:

  1. Header — Generation metadata, total transcripts, token/word counts
  2. Table of Contents — Dates, titles, tokens, words per transcript
  3. Transcripts — Full text with title, date, author, source URL

Troubleshooting

ProblemCauseSolution
ModuleNotFoundError: tiktoken
tiktoken not installed
pip install tiktoken
No transcripts found
Empty transcripts folderRun
ytscriber download
first
FileNotFoundError
Channel doesn't existCheck
ls ~/Documents/YTScriber/
for valid names
Output file is smallFew transcripts availableUse
--verbose
to see what was included
Token count seems wrongOld tiktoken version
pip install --upgrade tiktoken

Common Mistakes

  1. Wrong channel name — Use the folder name exactly as shown in
    ls ~/Documents/YTScriber/
    , not the YouTube channel name.
  2. Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run
    ytscriber download
    first.
  3. Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
  4. Expecting real-time updates — Re-run consolidation after downloading new transcripts.

Reference

  • Transcripts sorted newest first (descending by date)
  • Files without dates in filename are placed last
  • Token counting uses
    cl100k_base
    encoding (GPT-4/Claude compatible)
  • Consolidated files are gitignored (not committed)
  • Re-running overwrites the previous consolidated file