Claude-skills-journalism interview-transcription
Interview management, transcription workflows, and source note-taking for journalists. Use when preparing for interviews, managing recordings, transcribing audio/video, organizing source notes, creating timestamped references, or building interview databases. Essential for reporters conducting interviews and managing source relationships.
git clone https://github.com/jamditis/claude-skills-journalism
T=$(mktemp -d) && git clone --depth=1 https://github.com/jamditis/claude-skills-journalism "$T" && mkdir -p ~/.claude/skills && cp -r "$T/interview-transcription" ~/.claude/skills/jamditis-claude-skills-journalism-interview-transcription && rm -rf "$T"
interview-transcription/SKILL.mdInterview transcription and management
Practical workflows for journalists managing interviews from preparation through publication.
When to activate
- Preparing questions for an interview
- Processing audio/video recordings
- Creating or managing transcripts
- Organizing notes from multiple sources
- Building a source relationship database
- Generating timestamped quotes for fact-checking
- Converting recordings to publishable quotes
Pre-interview preparation
Research checklist
Before recording starts, you should already know:
## Source prep for: [Name] ### Background - Role/title: - Organization: - Why they're relevant to this story: - Previous media appearances (note inconsistencies): ### Key questions (prioritized) 1. [Must-ask question] 2. [Must-ask question] 3. [If time permits] ### Documents to reference - [ ] Bring/share [specific document] - [ ] Ask about [specific claim/data point] ### Red lines - Topics they'll likely avoid: - Sensitive areas to approach carefully:
Recording setup
# Standard recording configuration RECORDING_SETTINGS = { 'format': 'wav', # Lossless for transcription 'sample_rate': 44100, # Standard quality 'channels': 1, # Mono is fine for speech 'backup': True, # Always run backup recorder } # File naming convention # YYYY-MM-DD_source-lastname_topic.wav # Example: 2024-03-15_smith_budget-hearing.wav
Two-device rule: Always record on two devices. Phone as backup minimum.
Transcription workflows
Automated transcription pipeline
from pathlib import Path import subprocess import json def transcribe_interview(audio_path: str, output_dir: str = "./transcripts") -> dict: """ Transcribe using Whisper with speaker diarization. Returns transcript with timestamps. """ Path(output_dir).mkdir(exist_ok=True) # Use whisper.cpp or OpenAI Whisper result = subprocess.run([ 'whisper', audio_path, '--model', 'medium', '--output_format', 'json', '--output_dir', output_dir, '--language', 'en', '--word_timestamps', 'True' ], capture_output=True) # Load and return structured transcript json_path = Path(output_dir) / f"{Path(audio_path).stem}.json" with open(json_path) as f: return json.load(f) def format_for_editing(transcript: dict) -> str: """Convert to journalist-friendly format with timestamps.""" lines = [] for segment in transcript.get('segments', []): timestamp = format_timestamp(segment['start']) text = segment['text'].strip() lines.append(f"[{timestamp}] {text}") return '\n\n'.join(lines) def format_timestamp(seconds: float) -> str: """Convert seconds to HH:MM:SS format.""" h = int(seconds // 3600) m = int((seconds % 3600) // 60) s = int(seconds % 60) return f"{h:02d}:{m:02d}:{s:02d}"
Manual transcription template
For sensitive interviews or when AI transcription fails:
## Transcript: [Source] - [Date] **Recording file**: [filename] **Duration**: [XX:XX] **Transcribed by**: [name] **Verified against recording**: [ ] Yes / [ ] No --- [00:00:15] **Q**: [Your question] [00:00:45] **A**: [Source response - verbatim, including ums, pauses noted as (...)] [00:01:30] **Q**: [Follow-up] [00:01:42] **A**: [Response] --- ## Notes - [Anything not captured in audio: gestures, documents shown, etc.] ## Potential quotes - [00:01:42] "Quote that stands out" - context: [why it matters]
Quote extraction and verification
Pull quotes workflow
from dataclasses import dataclass from typing import Optional import re @dataclass class Quote: text: str timestamp: str speaker: str context: str verified: bool = False used_in: Optional[str] = None class QuoteBank: """Manage quotes from interview transcripts.""" def __init__(self): self.quotes = [] def extract_quote(self, transcript: str, start_time: str, end_time: str, speaker: str, context: str) -> Quote: """Extract and store a quote with metadata.""" # Pull text between timestamps pattern = rf'\[{re.escape(start_time)}\](.+?)(?=\[\d|$)' match = re.search(pattern, transcript, re.DOTALL) if match: text = match.group(1).strip() quote = Quote( text=text, timestamp=start_time, speaker=speaker, context=context ) self.quotes.append(quote) return quote return None def verify_quote(self, quote: Quote, audio_path: str) -> bool: """Mark quote as verified against original recording.""" # In practice: listen to audio at timestamp, confirm accuracy quote.verified = True return True def export_for_story(self) -> str: """Export verified quotes ready for publication.""" output = [] for q in self.quotes: if q.verified: output.append(f'"{q.text}"\n— {q.speaker}\n[Timestamp: {q.timestamp}]') return '\n\n'.join(output)
Quote accuracy checklist
Before publishing any quote:
- [ ] Listened to original recording at timestamp - [ ] Quote is verbatim (or clearly marked as paraphrased) - [ ] Context preserved (not cherry-picked to change meaning) - [ ] Speaker identified correctly - [ ] Timestamp documented for fact-checker - [ ] Source approved quote (if agreement made)
Source management database
Interview tracking schema
from dataclasses import dataclass, field from datetime import datetime from typing import List, Optional from enum import Enum class SourceStatus(Enum): ACTIVE = "active" # Currently engaged DORMANT = "dormant" # Not recently contacted DECLINED = "declined" # Refused to participate OFF_RECORD = "off_record" # Background only class InterviewType(Enum): ON_RECORD = "on_record" BACKGROUND = "background" DEEP_BACKGROUND = "deep_background" OFF_RECORD = "off_record" @dataclass class Source: name: str organization: str contact_info: dict # email, phone, signal, etc. beat: str status: SourceStatus = SourceStatus.ACTIVE interviews: List['Interview'] = field(default_factory=list) notes: str = "" # Relationship tracking first_contact: Optional[datetime] = None trust_level: int = 1 # 1-5 scale @dataclass class Interview: source: str date: datetime interview_type: InterviewType recording_path: Optional[str] = None transcript_path: Optional[str] = None story_slug: Optional[str] = None key_quotes: List[str] = field(default_factory=list) follow_up_needed: bool = False notes: str = ""
Quick source lookup
def find_sources_for_story(sources: List[Source], topic: str, beat: str = None) -> List[Source]: """Find relevant sources for a new story.""" matches = [] for source in sources: # Filter by beat if specified if beat and source.beat != beat: continue # Only suggest active sources if source.status != SourceStatus.ACTIVE: continue # Check if they've spoken on similar topics for interview in source.interviews: if topic.lower() in interview.notes.lower(): matches.append(source) break # Sort by trust level return sorted(matches, key=lambda s: s.trust_level, reverse=True)
Audio/video processing
Batch processing multiple recordings
from pathlib import Path from concurrent.futures import ProcessPoolExecutor import json def batch_transcribe(recordings_dir: str, output_dir: str) -> dict: """Process all recordings in a directory.""" recordings = list(Path(recordings_dir).glob('*.wav')) + \ list(Path(recordings_dir).glob('*.mp3')) + \ list(Path(recordings_dir).glob('*.m4a')) results = {} with ProcessPoolExecutor(max_workers=4) as executor: futures = { executor.submit(transcribe_interview, str(rec), output_dir): rec for rec in recordings } for future in futures: rec = futures[future] try: transcript = future.result() results[rec.name] = { 'status': 'success', 'transcript': transcript } except Exception as e: results[rec.name] = { 'status': 'error', 'error': str(e) } return results
Video interview extraction
import subprocess def extract_audio_from_video(video_path: str, output_path: str = None) -> str: """Extract audio track from video for transcription.""" if output_path is None: output_path = video_path.rsplit('.', 1)[0] + '.wav' subprocess.run([ 'ffmpeg', '-i', video_path, '-vn', # No video '-acodec', 'pcm_s16le', # WAV format '-ar', '44100', # Sample rate '-ac', '1', # Mono output_path ], check=True) return output_path
Legal and ethical considerations
Consent documentation
## Recording consent record **Date**: **Source name**: **Recording type**: [ ] Audio [ ] Video **Interview type**: [ ] On record [ ] Background [ ] Off record ### Consent obtained: - [ ] Verbal consent recorded at start of interview - [ ] Written consent form signed - [ ] Email confirmation of consent ### Jurisdiction notes: - Interview location state/country: - One-party or two-party consent jurisdiction: - Any specific restrictions agreed: ### Agreed terms: - [ ] Full attribution allowed - [ ] Organization attribution only - [ ] Anonymous source - [ ] Review quotes before publication - [ ] Embargo until [date]:
Two-party consent states (US)
California, Connecticut, Florida, Illinois, Maryland, Massachusetts, Michigan, Montana, Nevada, New Hampshire, Pennsylvania, Washington require all-party consent.
Always get explicit consent on recording regardless of jurisdiction.
Tools and resources
| Tool | Purpose | Notes |
|---|---|---|
| Whisper | Local transcription | Free, accurate, private |
| Otter.ai | Cloud transcription | Real-time, speaker ID |
| Descript | Edit audio like text | Good for pulling clips |
| Rev | Human transcription | For sensitive/legal |
| Trint | Journalist-focused | Collaboration features |
| oTranscribe | Free web player | Manual transcription aid |
Related skills
- source-verification - Verify source credentials before interview
- foia-requests - Get documents to inform interview questions
- data-journalism - Analyze data sources mention in interviews
Skill metadata
| Field | Value |
|---|---|
| Version | 1.0.0 |
| Created | 2025-12-26 |
| Author | Claude Skills for Journalism |
| Domain | Journalism, Research |
| Complexity | Intermediate |