Claude-code-startup-skills transcribe-video
Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.
install
source · Clone the upstream repo
git clone https://github.com/rameerez/claude-code-startup-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/rameerez/claude-code-startup-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/transcribe-video" ~/.claude/skills/rameerez-claude-code-startup-skills-transcribe-video && rm -rf "$T"
manifest:
skills/transcribe-video/SKILL.mdsource content
Video Transcription Skill
Generate subtitles and transcripts from
$ARGUMENTS (a video or audio file path, optionally followed by a language code like en-US or es-ES) using AWS Transcribe.
Outputs
.srt, .vtt, and .txt files next to the source file.
Process
- Verify prerequisites - check
andffmpeg
CLI are installed and configuredaws - Extract audio from the video as MP3 using ffmpeg
- Create temporary S3 bucket, upload audio
- Run AWS Transcribe job with SRT and VTT subtitle output
- Download results and generate plain text transcript
- Clean up all AWS resources - delete S3 bucket, Transcribe job, and temp files. No recurring costs.
Prerequisites
installed (ffmpeg
)brew install ffmpeg
CLI installed and configured with valid credentials (aws
)brew install awscli && aws configure- AWS credentials need permissions for:
(create/delete buckets),s3:*
(start/delete jobs)transcribe:*
Step-by-Step
Step 1: Extract audio
ffmpeg -i "input.mp4" -vn -acodec mp3 -q:a 2 "/tmp/transcribe-audio.mp3" -y
Step 2: Create temp S3 bucket and upload
BUCKET="tmp-transcribe-$(date +%s)" aws s3 mb "s3://$BUCKET" --region us-east-1 aws s3 cp "/tmp/transcribe-audio.mp3" "s3://$BUCKET/audio.mp3"
Step 3: Start transcription job
JOB_NAME="tmp-job-$(date +%s)" aws transcribe start-transcription-job \ --transcription-job-name "$JOB_NAME" \ --language-code en-US \ --media-format mp3 \ --media "MediaFileUri=s3://$BUCKET/audio.mp3" \ --subtitles "Formats=srt,vtt" \ --output-bucket-name "$BUCKET" \ --region us-east-1
Language codes:
en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP, zh-CN, it-IT, ko-KR, etc. Default to en-US if not specified.
Step 4: Poll until complete
while true; do STATUS=$(aws transcribe get-transcription-job \ --transcription-job-name "$JOB_NAME" \ --region us-east-1 \ --query 'TranscriptionJob.TranscriptionJobStatus' \ --output text) if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then break; fi sleep 5 done
Step 5: Download subtitle files
Save
.srt and .vtt next to the original file:
aws s3 cp "s3://$BUCKET/$JOB_NAME.srt" "/path/to/input.srt" aws s3 cp "s3://$BUCKET/$JOB_NAME.vtt" "/path/to/input.vtt"
Step 6: Generate plain text transcript
Download the JSON result and extract the full transcript text:
aws s3 cp "s3://$BUCKET/$JOB_NAME.json" "/tmp/transcribe-result.json"
Then use a tool to extract the
.results.transcripts[0].transcript field from the JSON and save it as a .txt file next to the original.
Step 7: Clean up everything
IMPORTANT: Always clean up to avoid recurring S3 storage costs.
# Delete S3 bucket and all contents aws s3 rb "s3://$BUCKET" --force --region us-east-1 # Delete the transcription job aws transcribe delete-transcription-job --transcription-job-name "$JOB_NAME" --region us-east-1 # Delete temp audio file rm -f "/tmp/transcribe-audio.mp3" "/tmp/transcribe-result.json"
Real-World Results (Reference)
From actual transcription runs:
| Video | Duration | Audio Size | Transcribe Time | Subtitle Segments |
|---|---|---|---|---|
| X/Twitter clip | 2:40 | 2.5 MB | ~20 seconds | 83 |
| Screen recording | 18:45 | 11.4 MB | ~60 seconds | 500+ |
Key Insights
- AWS Transcribe is fast - even 19-minute videos complete in about a minute
- Short-form content (tweets, reels) transcribes almost instantly
- Cost is negligible - AWS Transcribe charges ~$0.024/min, so a 19-min video costs ~$0.46
- Cleanup is critical - always delete the S3 bucket to avoid storage charges
- SRT is most compatible - works with most video players and editors; VTT is better for web
Output Files
original-video.mp4 original-video.srt # Subtitles with timestamps (most compatible) original-video.vtt # Web-optimized subtitles (for HTML5 <track>) original-video.txt # Plain text transcript (no timestamps)
After Transcription
- Verify all output files exist:
ls -lh /path/to/original-video.{srt,vtt,txt} - Report the number of subtitle segments and total duration
- Confirm all AWS resources have been cleaned up (no S3 buckets, no Transcribe jobs remaining)