Skillsbench filler-word-processing
Process filler word annotations to generate video edit lists. Use when working with timestamp annotations for removing speech disfluencies (um, uh, like, you know) from audio/video content.
install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/video-filler-word-remover/environment/skills/filler-word-processing" ~/.claude/skills/benchflow-ai-skillsbench-filler-word-processing && rm -rf "$T"
manifest:
tasks/video-filler-word-remover/environment/skills/filler-word-processing/SKILL.mdsource content
Filler Word Processing
Annotation Format
Typical annotation JSON structure:
[ {"word": "um", "timestamp": 12.5}, {"word": "like", "timestamp": 25.3}, {"word": "you know", "timestamp": 45.8} ]
Converting Annotations to Cut Segments
Each filler word annotation marks when the word starts. To remove it, use word-specific durations since different fillers have different lengths:
import json # Word-specific durations (in seconds) WORD_DURATIONS = { "uh": 0.3, "um": 0.4, "hum": 0.6, "hmm": 0.6, "mhm": 0.55, "like": 0.3, "yeah": 0.35, "so": 0.25, "well": 0.35, "okay": 0.4, "basically": 0.55, "you know": 0.55, "i mean": 0.5, "kind of": 0.5, "i guess": 0.5, } DEFAULT_DURATION = 0.4 def annotations_to_segments(annotations_file, buffer=0.05): """ Convert filler word annotations to (start, end) cut segments. Args: annotations_file: Path to JSON annotations buffer: Small buffer before the word (seconds) Returns: List of (start, end) tuples representing segments to remove """ with open(annotations_file) as f: annotations = json.load(f) segments = [] for ann in annotations: word = ann.get('word', '').lower().strip() timestamp = ann['timestamp'] # Use word-specific duration, fall back to default word_duration = WORD_DURATIONS.get(word, DEFAULT_DURATION) # Cut starts slightly before the word start = max(0, timestamp - buffer) # Cut ends after word duration end = timestamp + word_duration segments.append((start, end)) return segments
Merging Overlapping Segments
When filler words are close together, merge their cut segments:
def merge_overlapping_segments(segments, min_gap=0.1): """ Merge segments that overlap or are very close together. Args: segments: List of (start, end) tuples min_gap: Minimum gap to keep segments separate Returns: Merged list of segments """ if not segments: return [] # Sort by start time sorted_segs = sorted(segments) merged = [sorted_segs[0]] for start, end in sorted_segs[1:]: prev_start, prev_end = merged[-1] # If this segment overlaps or is very close to previous if start <= prev_end + min_gap: # Extend the previous segment merged[-1] = (prev_start, max(prev_end, end)) else: merged.append((start, end)) return merged
Complete Processing Pipeline
def process_filler_annotations(annotations_file, word_duration=0.4): """Full pipeline: load annotations -> create segments -> merge overlaps""" # Load and create initial segments segments = annotations_to_segments(annotations_file, word_duration) # Merge overlapping cuts merged = merge_overlapping_segments(segments) return merged
Tuning Parameters
| Parameter | Typical Value | Notes |
|---|---|---|
| word_duration | varies | Short fillers (um, uh) ~0.25-0.3s, single words (like, yeah) ~0.3-0.4s, phrases (you know, i mean) ~0.5-0.6s |
| buffer | 0.05s | Small buffer captures word onset |
| min_gap | 0.1s | Prevents micro-segments between close fillers |
Word Duration Guidelines
| Category | Words | Duration |
|---|---|---|
| Quick hesitations | uh, um | 0.3-0.4s |
| Sustained hums (drawn out while thinking) | hum, hmm, mhm | 0.55-0.6s |
| Quick single words | like, yeah, so, well | 0.25-0.35s |
| Longer single words | okay, basically | 0.4-0.55s |
| Multi-word phrases | you know, i mean, kind of, i guess | 0.5-0.55s |
Quality Considerations
- Too aggressive: Cuts into adjacent words, sounds choppy
- Too conservative: Filler words partially audible
- Sweet spot: Clean cuts with natural-sounding result
Test with a few samples before processing full video.