Skills assemblyai
install
source · Clone the upstream repo
git clone https://github.com/TerminalSkills/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/assemblyai" ~/.claude/skills/terminalskills-skills-assemblyai && rm -rf "$T"
manifest:
skills/assemblyai/SKILL.mdsafety · automated scan (medium risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- pip install
- references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
AssemblyAI
Overview
AssemblyAI provides best-in-class speech recognition plus an intelligence layer: speaker diarization, sentiment analysis, auto chapters, content moderation, and LeMUR (LLM-powered Q&A on audio). Use it to turn audio/video files into structured, queryable data.
Setup
pip install assemblyai python-dotenv export ASSEMBLYAI_API_KEY="your_api_key_here"
Core Concepts
- Transcript: The async job that converts audio → text. Submit a URL or file, poll for completion.
- Audio Intelligence: Optional enrichments added to the transcript request (diarization, sentiment, chapters, etc.).
- LeMUR: Apply LLMs to your transcript — summarize, answer questions, extract structured data.
- Real-time: Stream audio via WebSocket for live transcription.
Instructions
Step 1: Initialize the client
import assemblyai as aai import os aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]
Step 2: Transcribe a file (basic)
def transcribe(audio_source: str) -> aai.Transcript: """ audio_source: URL (https://...) or local file path. Returns the completed Transcript object. """ transcriber = aai.Transcriber() transcript = transcriber.transcribe(audio_source) if transcript.status == aai.TranscriptStatus.error: raise RuntimeError(f"Transcription error: {transcript.error}") print(f"Transcript ID: {transcript.id}") print(f"Text (first 300 chars): {transcript.text[:300]}...") return transcript t = transcribe("https://assembly.ai/sports_injuries.mp3") print(t.text)
Step 3: Transcribe with full audio intelligence
def transcribe_rich(audio_source: str) -> aai.Transcript: """Transcribe with speaker labels, sentiment, chapters, and content safety.""" config = aai.TranscriptionConfig( speaker_labels=True, # Who said what sentiment_analysis=True, # Positive/negative/neutral per sentence auto_chapters=True, # Generate chapter markers content_safety=True, # Detect profanity, hate speech, etc. auto_highlights=True, # Key phrases and topics entity_detection=True, # People, places, organizations iab_categories=True, # Topic taxonomy language_detection=True # Detect language automatically ) transcriber = aai.Transcriber() transcript = transcriber.transcribe(audio_source, config=config) if transcript.status == aai.TranscriptStatus.error: raise RuntimeError(transcript.error) return transcript t = transcribe_rich("https://your-audio.com/podcast.mp3") # Speaker diarization print("\n--- Speakers ---") for utt in t.utterances: print(f"[{utt.speaker}] {utt.text}") # Chapters print("\n--- Chapters ---") for ch in t.chapters: start_min = ch.start // 60000 print(f"[{start_min}m] {ch.headline}: {ch.summary}") # Sentiment print("\n--- Sentiment ---") for s in t.sentiment_analysis[:5]: print(f"{s.sentiment.value}: {s.text[:80]}") # Content safety print("\n--- Content Safety ---") for label, result in t.content_safety_labels.results.items(): if result.status == "flagged": print(f"Flagged: {label} (confidence: {result.confidence:.2f})")
Step 4: Real-time streaming transcription
import assemblyai as aai import pyaudio # pip install pyaudio def on_open(session_opened: aai.RealtimeSessionOpened): print(f"Session opened: {session_opened.session_id}") def on_data(transcript: aai.RealtimeTranscript): if not transcript.text: return if isinstance(transcript, aai.RealtimeFinalTranscript): print(f"\n[FINAL] {transcript.text}") else: print(f"\r[partial] {transcript.text}", end="") def on_error(error: aai.RealtimeError): print(f"Error: {error}") def on_close(): print("Session closed.") def stream_microphone(): """Stream microphone input to AssemblyAI for real-time transcription.""" transcriber = aai.RealtimeTranscriber( sample_rate=16_000, on_data=on_data, on_error=on_error, on_open=on_open, on_close=on_close, end_utterance_silence_threshold=700 ) transcriber.connect() FRAMES_PER_BUFFER = 3200 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 16_000 p = pyaudio.PyAudio() stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=FRAMES_PER_BUFFER) try: print("Recording... Press Ctrl+C to stop.") while True: data = stream.read(FRAMES_PER_BUFFER) transcriber.stream(data) except KeyboardInterrupt: pass finally: stream.stop_stream() stream.close() p.terminate() transcriber.close() stream_microphone()
Step 5: LeMUR — ask questions about audio
def lemur_qa(transcript_id: str, questions: list[str]) -> list[dict]: """ Ask LeMUR questions about a transcript. Returns list of {question, answer} dicts. """ transcript = aai.Transcript.get_by_id(transcript_id) questions_answers = transcript.lemur.question_answer( questions=[ aai.LemurQuestion(question=q, answer_format="concise") for q in questions ], final_model=aai.LemurModel.claude3_5_sonnet ) results = [] for qa in questions_answers.response: print(f"Q: {qa.question}\nA: {qa.answer}\n") results.append({"question": qa.question, "answer": qa.answer}) return results # Use LeMUR to extract structured insights lemur_qa(t.id, [ "What are the main topics discussed?", "List any action items or decisions made.", "What is the overall sentiment of the conversation?" ])
Step 6: LeMUR summarization
def lemur_summarize(transcript_id: str, context: str = "") -> str: """Generate a concise summary of a transcript.""" transcript = aai.Transcript.get_by_id(transcript_id) result = transcript.lemur.summarize( context=context or "This is a podcast episode.", answer_format="bullet points", final_model=aai.LemurModel.claude3_5_sonnet ) print(result.response) return result.response summary = lemur_summarize(t.id, context="B2B SaaS podcast discussing AI trends")
Step 7: Generate show notes (combined pipeline)
def generate_show_notes(audio_url: str) -> dict: """Full podcast processing pipeline.""" config = aai.TranscriptionConfig( speaker_labels=True, auto_chapters=True, auto_highlights=True ) transcriber = aai.Transcriber() transcript = transcriber.transcribe(audio_url, config=config) if transcript.status == aai.TranscriptStatus.error: raise RuntimeError(transcript.error) # Build chapters list chapters = [ {"time": f"{ch.start // 60000}:{(ch.start % 60000) // 1000:02d}", "title": ch.headline, "summary": ch.summary} for ch in transcript.chapters ] # LeMUR for show notes show_notes = transcript.lemur.task( prompt=( "Write podcast show notes in markdown. Include: " "1-paragraph episode summary, key takeaways as bullets, " "and a list of resources mentioned." ), final_model=aai.LemurModel.claude3_5_sonnet ) # Social clips (key quotes) social_prompt = transcript.lemur.task( prompt="Extract 3 compelling quotes suitable for social media posts. Format each as a standalone quote with speaker label.", final_model=aai.LemurModel.claude3_5_sonnet ) return { "transcript_id": transcript.id, "full_text": transcript.text, "chapters": chapters, "show_notes": show_notes.response, "social_clips": social_prompt.response } result = generate_show_notes("https://your-podcast.com/episode-42.mp3") print(result["show_notes"])
Audio Intelligence features reference
| Feature | Config param | Description |
|---|---|---|
| Speaker labels | | Identify and label each speaker |
| Sentiment analysis | | Per-sentence positive/negative/neutral |
| Auto chapters | | Detect topic segments with summaries |
| Content safety | | Flag hate speech, profanity, etc. |
| Entity detection | | Extract names, places, organizations |
| Key phrases | | Most important topics and phrases |
| Language detection | | Auto-detect spoken language |
| PII redaction | | Mask personal information |
Guidelines
- Audio must be accessible via URL or uploaded; local files can be passed directly to
— the SDK handles uploading.transcriber.transcribe() - Transcription typically completes in 20–50% of audio duration (a 10-min file → ~2–5 min).
- LeMUR runs on top of the completed transcript, adding another few seconds.
- For real-time streaming, use 16kHz mono PCM audio for best accuracy.
- PII redaction (
) is useful for compliance when transcribing customer calls.redact_pii=True - Store API keys in environment variables — never hardcode them.