Claude-skill-registry consolidate-transcripts

Consolidate transcripts from a channel into a single file, sorted by date (newest first), up to 800K tokens. Use when preparing transcripts for LLM context or bulk analysis.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/consolidate-transcripts" ~/.claude/skills/majiayu000-claude-skill-registry-consolidate-transcripts && rm -rf "$T"

manifest: skills/data/consolidate-transcripts/SKILL.md

Consolidate Transcripts

Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.

Quick Start

python scripts/consolidate_transcripts.py <channel_name>

Output:

~/Documents/YTScriber/<channel_name>/<channel_name>-consolidated.md

[!NOTE] This feature is currently a standalone script. A
ytscriber consolidate
CLI command is planned for a future release.

Workflow

1. Identify the Channel

List available channels:

ls ~/Documents/YTScriber/

2. Choose Token Limit

Use Case	Recommended Limit	Flag
Claude (200K context)	150000	`--limit 150000`
GPT-4 Turbo (128K)	100000	`--limit 100000`
Full archive (Claude Pro)	800000	(default)
Quick sample	50000	`--limit 50000`

[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.

3. Run Consolidation

python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]

Examples:

# Default (800K tokens)
python scripts/consolidate_transcripts.py library-of-minds

# Custom limit for GPT-4
python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000

# Verbose output showing all included files
python scripts/consolidate_transcripts.py dwarkesh-patel --verbose

4. Verify Output

Check the consolidated file was created:

ls -la ~/Documents/YTScriber/<channel_name>/*-consolidated.md

Parameters

Option	Description	Default
`channel_name`	Folder name in data directory	Required
`--limit, -l`	Maximum tokens to include	800000
`--verbose, -v`	Show detailed file list	False

Output Format

The consolidated file includes:

Header — Generation metadata, total transcripts, token/word counts
Table of Contents — Dates, titles, tokens, words per transcript
Transcripts — Full text with title, date, author, source URL

Troubleshooting

Problem	Cause	Solution
`ModuleNotFoundError: tiktoken`	tiktoken not installed	`pip install tiktoken`
`No transcripts found`	Empty transcripts folder	Run `ytscriber download` first
`FileNotFoundError`	Channel doesn't exist	Check `ls ~/Documents/YTScriber/` for valid names
Output file is small	Few transcripts available	Use `--verbose` to see what was included
Token count seems wrong	Old tiktoken version	`pip install --upgrade tiktoken`

Common Mistakes

Wrong channel name — Use the folder name exactly as shown in
```
ls ~/Documents/YTScriber/
```
, not the YouTube channel name.
Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run
```
ytscriber download
```
first.
Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
Expecting real-time updates — Re-run consolidation after downloading new transcripts.

Reference

Transcripts sorted newest first (descending by date)
Files without dates in filename are placed last
Token counting uses
```
cl100k_base
```
encoding (GPT-4/Claude compatible)
Consolidated files are gitignored (not committed)
Re-running overwrites the previous consolidated file