Claude-skill-registry gastrohem-media-processor
Automatically process unprocessed audio and image files in Gastrohem daily WhatsApp folders. This skill should be used when the user asks to transcribe audio files, perform OCR on images, or process media in daily folders (e.g., "Process media in today's folder", "Transcribe audio and OCR images in 24.10 folder"). Handles audio transcription using insanely-fast-whisper (parallelized, creates .json) and image OCR using Claude's vision capabilities (creates natural .md summaries with Gastrohem-relevant info).
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/gastrohem-media-processor" ~/.claude/skills/majiayu000-claude-skill-registry-gastrohem-media-processor && rm -rf "$T"
skills/data/gastrohem-media-processor/SKILL.mdGastrohem Media Processor
Overview
Automatski procesira WhatsApp media fajlove (audio i slike) u Gastrohem dnevnim folderima.
Što radi:
- Transkribuje audio fajlove paralelno (3x brže) → kreira
fajlove.json - Identifikuje slike koje trebaju OCR → kreira
fajlove sa prirodnim sažetkom.md - Default: Koristi današnji datum automatski
- Skenira SVE odjele odjednom za dati datum
Performance:
- Audio: Do 3 fajla istovremeno (paralelno)
- Slike: Batch OCR (sve odjednom sa Claude vision)
- Scan: Nalazi sve foldere za datum u <1 sekundi
Note: Skill kreira
.json za audio i .md za slike - dodavanje u chat.md je odvojen korak.
When to Use This Skill
User says:
- "Process media"
- "Process today's media"
- "Transcribe audio files"
- "Process media for 24.10"
- "OCR images in today's folders"
Default behavior: Uses today's date, scans all departments automatically.
Workflow
Simple Usage
Process all media for today (DEFAULT):
python .claude/skills/gastrohem-media-processor/scripts/process_media.py
- Uses today's date automatically (26.10 danas)
- Scans ALL departments for folders matching today's date
- Transcribes audio in parallel
- Lists images needing OCR
Process specific date:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --scan-date 24.10
Process specific folder:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --folder "gastrohem whatsapp/administracija/20.10 - 27.10/24.10"
What Happens
Audio Processing:
- Finds all audio files (
,.mp3
,.ogg
,.m4a
,.wav
).opus - Skips files that already have
transcriptions.json - Transcribes in parallel (3 concurrent processes)
- Creates JSON:
{audio_filename}.json{ "speakers": [], "chunks": [...], "text": "Full transcribed text here" }
Image Processing:
- Finds all images (
,.png
,.jpg
,.jpeg
,.webp
).bmp - Checks which images DON'T have
files.md - Returns list of images needing OCR
- Claude reads images in parallel and creates natural summaries with Gastrohem-relevant info
- Creates markdown:
{image_filename}.md# image.png **Poslao:** Mahir Kadic **Datum:** 26.10.2025 13:58 --- [Natural language summary focusing on Gastrohem-relevant information: contacts, names, emails, phone numbers, business details, etc.]
Script Reference
Three Separate Scripts
The skill now uses three modular scripts for better organization:
1. process_audio.py
Purpose: Audio transcription only (parallelized)
Usage:
python scripts/process_audio.py "path/to/folder" [--max-workers 3] [--output-json results.json]
Arguments:
- Path to folder containing audio filesfolder
- Max parallel processes (default: 3)--max-workers N
- Re-transcribe files with existing JSON--no-skip-existing
- Save results to JSON--output-json FILE
What it does:
- Finds audio files (
,.mp3
,.ogg
,.m4a
,.wav
).opus - Skips files with existing
transcriptions.json - Transcribes in parallel (max 3 concurrent)
- Creates JSON files:
{audio_filename}.json
Requirements:
insanely-fast-whisper in PATH
2. process_images.py
Purpose: Image OCR helper functions
Usage:
# List images needing OCR python scripts/process_images.py "path/to/folder" [--output-json images.json] # Use in Python for batch processing from process_images import save_ocr_md, batch_save_ocr, get_images_needing_ocr
Key functions:
- Returns list of images withoutget_images_needing_ocr(folder_path)
files.md
- Save natural summary tosave_ocr_md(image_file, summary, sender)
file.md
- Save multiple OCR results at oncebatch_save_ocr(ocr_results)
Markdown structure:
# image.png **Poslao:** Mahir Kadic **Datum:** 26.10.2025 13:58 --- Natural language summary focusing on Gastrohem-relevant information: contacts, names, emails, phone numbers, business details, etc.
3. process_media.py
Purpose: Master script combining audio + images
Usage:
# Scan all folders for a specific date python scripts/process_media.py --scan-date DD.MM [--output-json results.json] # Process a specific folder python scripts/process_media.py --folder "path/to/folder" [--output-json results.json]
Arguments:
- Scan all departments for this date--scan-date DD.MM
- Process specific folder--folder PATH
- Base path (default:--base-path PATH
)gastrohem whatsapp
- Re-process all files--no-skip-existing
- Save results to JSON--output-json FILE
What it does:
- Calls
for audio filesprocess_audio.py - Calls
to find images needing OCRprocess_images.py - Returns combined results
Best Practices
- Use
for daily processing - Automatically finds all folders for a specific date across all departments--scan-date - Process audio in parallel - The script now handles up to 3 audio files simultaneously for faster processing
- Batch OCR images - Read all images in parallel using Claude's vision for maximum efficiency
- Save results to JSON - Use
to keep a record of processing results--output-json - Process regularly - Process media files daily to avoid backlog
- Separate chat.md updates - This skill creates JSON files; updating
is a separate workflowchat.md
Error Handling
If transcription fails:
- Check that
is installedinsanely-fast-whisper - Verify the audio file is not corrupted
- Check that the device (mps) is available
- Manually transcribe if necessary
If image OCR is unclear:
- Request higher quality image from sender
- Focus on extracting key information rather than perfect transcription
- Note any illegible portions in the context field
Example Workflows
User: "Process media"
Claude:
- Runs:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py - Script uses today's date (26.10), scans all departments
- Finds 3 folders: administracija/26.10, finansije/26.10, adis-chat/26.10
- Transcribes 5 audio files in parallel →
files created.json - Returns list of 3 images needing OCR
- Claude reads all 3 images in parallel, creates natural summaries →
files created.md - Reports: "Processed 5 audio and 3 images across 3 folders."
User: "Process media for 24.10"
Claude:
- Runs:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --scan-date 24.10 - Same workflow for 24.10 date
Performance Improvements (Nove Optimizacije)
1. Paralelizacija Audio Transkripicja:
- Do 3 audio fajla se transkribuju istovremeno
- Koristi
za paralelno izvršavanjeThreadPoolExecutor - 3x brže nego prije
2. Batch OCR za Slike:
- Sve slike se mogu pročitati odjednom (parallel Read tool calls)
- Svaka slika dobija svoj
fajl:.mdimage.png.md - Prirodan sažetak sa fokusom na Gastrohem-relevantne informacije (kontakti, imena, brojevi, poslovni detalji)
- Helper funkcija
za lako čuvanjesave_ocr_md()
3. Automatski Scan Svih Foldera:
- Default: Koristi današnji datum (nije potrebno specificirati)
- Automatski pronalazi sve foldere za taj datum u SVIM odjelima
- Procesira sve odjednom
Struktura Skripti:
- Audio only (paralelno)process_audio.py
- Image OCR helper functionsprocess_images.py
- Master skripta (kombinuje oba)process_media.py
Tipična brzina:
- Audio: ~3-5 sec po fajlu (paralelno, 3 max)
- Slike: ~2-3 sec po slici (batch)
- Scan svih odjela: <1 sekunda