Asi livestream
Warehouse audio pipeline for live capture, transcription, and narration from meeting room mics via Tailscale. Triggers: livestream, warehouse audio, transcription pipeline, meeting capture, whisper.
install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/livestream" ~/.claude/skills/plurigrid-asi-livestream && rm -rf "$T"
manifest:
skills/livestream/SKILL.mdsource content
Livestream Skill: Warehouse Audio Pipeline
Live audio capture, transcription, and narration from the meeting room via Tailscale network.
Architecture
conversation-logger (10.1.10.107) Local Mac 3x EMEET OfficeCore M0 Plus USB mics (fallback: audio-capture-org.py) Whisper large-v3-turbo, 6-speaker mlx-whisper-small, no diarization PostgreSQL → Flask :5000 audio-capture.org → DuckDB │ │ ▼ ▼ /api/transcripts?limit=N live_history_pipeline.sql │ └──── sshpass via gx10-acee ──────────────┐ ▼ Say MCP (Samantha Enhanced)
Access Path
Step 1: SSH to gx10-acee (jump host)
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87
- Host: gx10-acee, Tailscale IP: 100.67.53.87
- User:
, Password:aaaaaaa - NVIDIA HDA audio card, WiFi on
wlP9s9
Step 2: SSH to conversation-logger
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no alu@10.1.10.107
- Host: conversation-logger, LAN IP: 10.1.10.107
- User:
, Password:aluaaaaaa - 3x EMEET mics on ALSA cards 1, 2, 3
Step 3: Query the API
curl -s 'http://10.1.10.107:5000/api/transcripts?limit=10'
One-liner (from local Mac)
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87 \ "curl -s 'http://10.1.10.107:5000/api/transcripts?limit=10'"
One-liner (execute command on logger)
sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87 \ 'sshpass -p "aaaaaa" ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no alu@10.1.10.107 "COMMAND"'
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| GET | Recent transcripts (JSON: id, speaker_id, transcript, started_at, ended_at, zone_id, confidence, duration_sec) |
| GET | Web UI transcript browser |
| GET | Conversation groupings |
| GET | Digest summaries |
| GET | Speaker profiles |
Infrastructure on conversation-logger
Systemd Services
— Mic 1 capture (device 4, Whisper large-v3-turbo)warehouse-capture-mic1.service
— Mic 2 capture (device 5)warehouse-capture-mic2.service
— Mic 3 capture (device 6)warehouse-capture-mic3.service
— Auto-gain controllerwarehouse-autogain.service
— Flask web dashboard (:5000)warehouse-gui.service
— PostgreSQL 16postgresql@16-main.service
Key Paths
— Main capture script/opt/warehouse-logging/scripts/capture_node.py
— Gain controller/opt/warehouse-logging/scripts/auto_gain.py
— Flask dashboard/opt/warehouse-logging/app.py
— Python virtualenv/opt/warehouse-logging/venv/
Hardware
- 3x EMEET OfficeCore M0 Plus (USB, Bus 001 Devices 3/5/9)
- ALSA cards: 1 (Plus), 2 (Plus_1), 3 (Plus_2)
- NVIDIA HDA on card 0 (not used for capture)
Network
- WiFi only (
): SSIDwlP9s9
, 2.4GHz Ch2, -47dBm, 94%TP-Link_A7B3 - All ethernet ports DOWN (NO-CARRIER) — single point of failure
- Consider connecting ethernet for reliability
Live Narration Script
Save to
/tmp/live-warehouse-stream.sh:
#!/bin/bash # Speaks ALL new transcripts, batched by speaker, no cutoffs export PATH="/Users/alice/v/.flox/run/aarch64-darwin.v.dev/bin:$PATH" ACEE="100.67.53.87" LOGGER="10.1.10.107" LAST_ID="" POLL_INTERVAL=5 LIMIT=50 voice_for_speaker() { case "$1" in SPEAKER_00|alu) echo "Ava (Premium)" ;; SPEAKER_01) echo "Evan (Enhanced)" ;; SPEAKER_02) echo "Allison (Enhanced)" ;; SPEAKER_03) echo "Nathan (Enhanced)" ;; SPEAKER_04) echo "Noelle (Enhanced)" ;; SPEAKER_05) echo "Nicky (Enhanced)" ;; silly-alu) echo "Samantha (Enhanced)" ;; *) echo "Ava (Premium)" ;; esac } while true; do RESULT=$(sshpass -p 'aaaaaa' ssh -o ConnectTimeout=8 \ -o StrictHostKeyChecking=no -o BatchMode=no a@$ACEE \ "curl -s 'http://$LOGGER:5000/api/transcripts?limit=$LIMIT'" 2>/dev/null) [ $? -ne 0 ] || [ -z "$RESULT" ] && { sleep $POLL_INTERVAL; continue; } # Parse & reverse to chronological order PARSED=$(echo "$RESULT" | python3 -c " import json,sys try: d=json.load(sys.stdin) lines = [] for t in d['transcripts']: lines.append(f\"{t['id']}|{t['speaker_id']}|{t['transcript']}\") for line in reversed(lines): print(line) except: pass " 2>/dev/null) [ -z "$PARSED" ] && { sleep $POLL_INTERVAL; continue; } # First run: initialize without speaking history if [ -z "$LAST_ID" ]; then LAST_ID=$(echo "$PARSED" | tail -1 | cut -d'|' -f1) sleep $POLL_INTERVAL; continue fi # Collect all new transcripts, batch consecutive same-speaker FOUND_LAST=0; CURRENT_SPEAKER=""; CURRENT_TEXT=""; NEW_COUNT=0 while IFS= read -r line; do ID=$(echo "$line" | cut -d'|' -f1) SPEAKER=$(echo "$line" | cut -d'|' -f2) TEXT=$(echo "$line" | cut -d'|' -f3) if [ "$FOUND_LAST" -eq 0 ]; then [ "$ID" = "$LAST_ID" ] && FOUND_LAST=1; continue fi NEW_COUNT=$((NEW_COUNT + 1)); LAST_ID="$ID" TRIMMED=$(echo "$TEXT" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//') [ -z "$TRIMMED" ] && continue if [ "$SPEAKER" = "$CURRENT_SPEAKER" ]; then CURRENT_TEXT="$CURRENT_TEXT $TRIMMED" else if [ -n "$CURRENT_TEXT" ] && [ -n "$CURRENT_SPEAKER" ]; then VOICE=$(voice_for_speaker "$CURRENT_SPEAKER") echo "[$(date +%H:%M:%S)] $CURRENT_SPEAKER: $CURRENT_TEXT" say -v "$VOICE" -r 210 "$CURRENT_TEXT" 2>/dev/null fi CURRENT_SPEAKER="$SPEAKER"; CURRENT_TEXT="$TRIMMED" fi done <<< "$PARSED" # Speak last batch if [ -n "$CURRENT_TEXT" ] && [ "$NEW_COUNT" -gt 0 ]; then VOICE=$(voice_for_speaker "$CURRENT_SPEAKER") echo "[$(date +%H:%M:%S)] $CURRENT_SPEAKER: $CURRENT_TEXT" say -v "$VOICE" -r 210 "$CURRENT_TEXT" 2>/dev/null fi sleep $POLL_INTERVAL done
Key design choices
: Catches all transcripts between polls (Whisper produces ~1 fragment/second)limit=50- Chronological reversal: API returns newest-first; we reverse for natural speech order
- Speaker batching: Consecutive same-speaker fragments concatenated into one
callsay - No "SPEAKER says:" prefix: Voice identity conveys the speaker; text spoken naturally
- First-poll skip: Initializes at current position without blasting history
Say MCP Voice Selection
Two MCP servers available for TTS:
| Server | Tool | Voice Param | Rate Param |
|---|---|---|---|
| | Name string (e.g. ) | WPM (1-500, default 175) |
| | Name or identifier (e.g. ) | 0.0-1.0 mapped to 80-300 WPM, or direct WPM if >1 |
High-Quality en-US Voices
| Voice | Quality | Identifier | Gender | Trit |
|---|---|---|---|---|
| Ava (Premium) | premium | | F | +1 |
| Ava (Enhanced) | enhanced | | F | +1 |
| Samantha (Enhanced) | enhanced | | F | 0 |
| Allison (Enhanced) | enhanced | | F | -1 |
| Evan (Enhanced) | enhanced | | M | +1 |
| Nathan (Enhanced) | enhanced | | M | 0 |
| Nicky (Enhanced) | enhanced | | F | -1 |
| Noelle (Enhanced) | enhanced | | F | 0 |
Per-Speaker Voice Mapping
- SPEAKER_00/alu → Ava (Premium) — primary speaker, highest quality
- SPEAKER_01 → Evan (Enhanced) — male voice for contrast
- SPEAKER_02 → Allison (Enhanced)
- SPEAKER_03 → Nathan (Enhanced)
- SPEAKER_04 → Noelle (Enhanced)
- SPEAKER_05 → Nicky (Enhanced)
MCP vs CLI Usage
- Background script (
): Uses CLI/tmp/live-warehouse-stream.sh
— works headlesssay -v "Voice Name" - In-session narration: Use
with voice identifier for full controlmcp__macos-speech-sdk__speak
hasmcp__say__speak
param for non-blocking speechbackground: true
Local Fallback (SDF Ch8 Degeneracy)
When remote pipeline is unreachable, use local mic capture:
/Users/alice/v/.venv-mlx-lm/bin/python /Users/alice/v/scripts/audio-capture-org.py
- Captures MacBook Pro Microphone via FFmpeg avfoundation
:1 - Transcribes with mlx-whisper-small (16kHz, 8s chunks)
- Appends to
/Users/alice/v/audio-capture.org
DuckDB Integration
Ingest history for audio digest
duckdb -c ".read /Users/alice/v/live_history_pipeline.sql"
- Merges claude/preclaude/codex history
- Generates TTS-ready
fieldsnarration_line
view: top 10 sessions formatted for voiceaudio_digest
Audio ACSet database
— Structured audio metadata/Users/alice/v/audio_acset.duckdb- Tables: AudioFile, Transcript, Segment, Speaker, Topic, ACSetSchema
SDF Analysis
Per Software Design for Flexibility (Hanson & Sussman):
- Ch1 Combinators: Pipeline = compose(ssh_tunnel, api_poll, tts_narrate)
- Ch7 Propagators: Transcripts flow: mic → whisper → postgres → API → say (bidirectional: can query history backwards)
- Ch8 Degeneracy: Remote warehouse (primary) vs local mic (fallback) — same generic interface, different implementations
- Ch9 Generic Dispatch:
dispatches on source type: warehouse API vs local org filenarrate(source)
Dependency Structure
[USB Mics] ──USB──→ [conversation-logger] │ [ALSA/PulseAudio] │ [capture_node.py × 3] │ [Whisper large-v3-turbo] │ [PostgreSQL 16] │ [Flask :5000] │ [WiFi: TP-Link_A7B3] ← SINGLE POINT OF FAILURE │ [LAN: 10.1.10.107] │ [gx10-acee: 100.67.53.87 via Tailscale] │ [Local Mac: sshpass + curl] │ [Say MCP / say command]
Risk: WiFi is the only network path. All ethernet ports show NO-CARRIER. Mitigation: USB mics and local capture/transcription continue even if WiFi drops — data accumulates locally and can be retrieved when connectivity returns.