Asi livestream

Warehouse audio pipeline for live capture, transcription, and narration from meeting room mics via Tailscale. Triggers: livestream, warehouse audio, transcription pipeline, meeting capture, whisper.

install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/livestream" ~/.claude/skills/plurigrid-asi-livestream && rm -rf "$T"
manifest: skills/livestream/SKILL.md
source content

Livestream Skill: Warehouse Audio Pipeline

Live audio capture, transcription, and narration from the meeting room via Tailscale network.

Architecture

conversation-logger (10.1.10.107)          Local Mac
  3x EMEET OfficeCore M0 Plus USB mics     (fallback: audio-capture-org.py)
  Whisper large-v3-turbo, 6-speaker         mlx-whisper-small, no diarization
  PostgreSQL → Flask :5000                  audio-capture.org → DuckDB
        │                                          │
        ▼                                          ▼
  /api/transcripts?limit=N                  live_history_pipeline.sql
        │
        └──── sshpass via gx10-acee ──────────────┐
                                                   ▼
                                          Say MCP (Samantha Enhanced)

Access Path

Step 1: SSH to gx10-acee (jump host)

sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87
  • Host: gx10-acee, Tailscale IP: 100.67.53.87
  • User:
    a
    , Password:
    aaaaaa
  • NVIDIA HDA audio card, WiFi on
    wlP9s9

Step 2: SSH to conversation-logger

sshpass -p 'aaaaaa' ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no alu@10.1.10.107
  • Host: conversation-logger, LAN IP: 10.1.10.107
  • User:
    alu
    , Password:
    aaaaaa
  • 3x EMEET mics on ALSA cards 1, 2, 3

Step 3: Query the API

curl -s 'http://10.1.10.107:5000/api/transcripts?limit=10'

One-liner (from local Mac)

sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87 \
  "curl -s 'http://10.1.10.107:5000/api/transcripts?limit=10'"

One-liner (execute command on logger)

sshpass -p 'aaaaaa' ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no a@100.67.53.87 \
  'sshpass -p "aaaaaa" ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no alu@10.1.10.107 "COMMAND"'

API Endpoints

EndpointMethodDescription
/api/transcripts?limit=N
GETRecent transcripts (JSON: id, speaker_id, transcript, started_at, ended_at, zone_id, confidence, duration_sec)
/transcripts
GETWeb UI transcript browser
/conversations
GETConversation groupings
/digests
GETDigest summaries
/speakers
GETSpeaker profiles

Infrastructure on conversation-logger

Systemd Services

  • warehouse-capture-mic1.service
    — Mic 1 capture (device 4, Whisper large-v3-turbo)
  • warehouse-capture-mic2.service
    — Mic 2 capture (device 5)
  • warehouse-capture-mic3.service
    — Mic 3 capture (device 6)
  • warehouse-autogain.service
    — Auto-gain controller
  • warehouse-gui.service
    — Flask web dashboard (:5000)
  • postgresql@16-main.service
    — PostgreSQL 16

Key Paths

  • /opt/warehouse-logging/scripts/capture_node.py
    — Main capture script
  • /opt/warehouse-logging/scripts/auto_gain.py
    — Gain controller
  • /opt/warehouse-logging/app.py
    — Flask dashboard
  • /opt/warehouse-logging/venv/
    — Python virtualenv

Hardware

  • 3x EMEET OfficeCore M0 Plus (USB, Bus 001 Devices 3/5/9)
  • ALSA cards: 1 (Plus), 2 (Plus_1), 3 (Plus_2)
  • NVIDIA HDA on card 0 (not used for capture)

Network

  • WiFi only (
    wlP9s9
    ): SSID
    TP-Link_A7B3
    , 2.4GHz Ch2, -47dBm, 94%
  • All ethernet ports DOWN (NO-CARRIER) — single point of failure
  • Consider connecting ethernet for reliability

Live Narration Script

Save to

/tmp/live-warehouse-stream.sh
:

#!/bin/bash
# Speaks ALL new transcripts, batched by speaker, no cutoffs
export PATH="/Users/alice/v/.flox/run/aarch64-darwin.v.dev/bin:$PATH"
ACEE="100.67.53.87"
LOGGER="10.1.10.107"
LAST_ID=""
POLL_INTERVAL=5
LIMIT=50

voice_for_speaker() {
    case "$1" in
        SPEAKER_00|alu)    echo "Ava (Premium)" ;;
        SPEAKER_01)        echo "Evan (Enhanced)" ;;
        SPEAKER_02)        echo "Allison (Enhanced)" ;;
        SPEAKER_03)        echo "Nathan (Enhanced)" ;;
        SPEAKER_04)        echo "Noelle (Enhanced)" ;;
        SPEAKER_05)        echo "Nicky (Enhanced)" ;;
        silly-alu)         echo "Samantha (Enhanced)" ;;
        *)                 echo "Ava (Premium)" ;;
    esac
}

while true; do
    RESULT=$(sshpass -p 'aaaaaa' ssh -o ConnectTimeout=8 \
        -o StrictHostKeyChecking=no -o BatchMode=no a@$ACEE \
        "curl -s 'http://$LOGGER:5000/api/transcripts?limit=$LIMIT'" 2>/dev/null)
    [ $? -ne 0 ] || [ -z "$RESULT" ] && { sleep $POLL_INTERVAL; continue; }

    # Parse & reverse to chronological order
    PARSED=$(echo "$RESULT" | python3 -c "
import json,sys
try:
    d=json.load(sys.stdin)
    lines = []
    for t in d['transcripts']:
        lines.append(f\"{t['id']}|{t['speaker_id']}|{t['transcript']}\")
    for line in reversed(lines):
        print(line)
except: pass
" 2>/dev/null)
    [ -z "$PARSED" ] && { sleep $POLL_INTERVAL; continue; }

    # First run: initialize without speaking history
    if [ -z "$LAST_ID" ]; then
        LAST_ID=$(echo "$PARSED" | tail -1 | cut -d'|' -f1)
        sleep $POLL_INTERVAL; continue
    fi

    # Collect all new transcripts, batch consecutive same-speaker
    FOUND_LAST=0; CURRENT_SPEAKER=""; CURRENT_TEXT=""; NEW_COUNT=0
    while IFS= read -r line; do
        ID=$(echo "$line" | cut -d'|' -f1)
        SPEAKER=$(echo "$line" | cut -d'|' -f2)
        TEXT=$(echo "$line" | cut -d'|' -f3)
        if [ "$FOUND_LAST" -eq 0 ]; then
            [ "$ID" = "$LAST_ID" ] && FOUND_LAST=1; continue
        fi
        NEW_COUNT=$((NEW_COUNT + 1)); LAST_ID="$ID"
        TRIMMED=$(echo "$TEXT" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
        [ -z "$TRIMMED" ] && continue
        if [ "$SPEAKER" = "$CURRENT_SPEAKER" ]; then
            CURRENT_TEXT="$CURRENT_TEXT $TRIMMED"
        else
            if [ -n "$CURRENT_TEXT" ] && [ -n "$CURRENT_SPEAKER" ]; then
                VOICE=$(voice_for_speaker "$CURRENT_SPEAKER")
                echo "[$(date +%H:%M:%S)] $CURRENT_SPEAKER: $CURRENT_TEXT"
                say -v "$VOICE" -r 210 "$CURRENT_TEXT" 2>/dev/null
            fi
            CURRENT_SPEAKER="$SPEAKER"; CURRENT_TEXT="$TRIMMED"
        fi
    done <<< "$PARSED"
    # Speak last batch
    if [ -n "$CURRENT_TEXT" ] && [ "$NEW_COUNT" -gt 0 ]; then
        VOICE=$(voice_for_speaker "$CURRENT_SPEAKER")
        echo "[$(date +%H:%M:%S)] $CURRENT_SPEAKER: $CURRENT_TEXT"
        say -v "$VOICE" -r 210 "$CURRENT_TEXT" 2>/dev/null
    fi
    sleep $POLL_INTERVAL
done

Key design choices

  • limit=50
    : Catches all transcripts between polls (Whisper produces ~1 fragment/second)
  • Chronological reversal: API returns newest-first; we reverse for natural speech order
  • Speaker batching: Consecutive same-speaker fragments concatenated into one
    say
    call
  • No "SPEAKER says:" prefix: Voice identity conveys the speaker; text spoken naturally
  • First-poll skip: Initializes at current position without blasting history

Say MCP Voice Selection

Two MCP servers available for TTS:

ServerToolVoice ParamRate Param
say
mcp__say__speak
Name string (e.g.
"Ava (Premium)"
)
WPM (1-500, default 175)
macos-speech-sdk
mcp__macos-speech-sdk__speak
Name or identifier (e.g.
"com.apple.voice.premium.en-US.Ava"
)
0.0-1.0 mapped to 80-300 WPM, or direct WPM if >1

High-Quality en-US Voices

VoiceQualityIdentifierGenderTrit
Ava (Premium)premium
com.apple.voice.premium.en-US.Ava
F+1
Ava (Enhanced)enhanced
com.apple.voice.enhanced.en-US.Ava
F+1
Samantha (Enhanced)enhanced
com.apple.voice.enhanced.en-US.Samantha
F0
Allison (Enhanced)enhanced
com.apple.voice.enhanced.en-US.Allison
F-1
Evan (Enhanced)enhanced
com.apple.voice.enhanced.en-US.Evan
M+1
Nathan (Enhanced)enhanced
com.apple.voice.enhanced.en-US.Nathan
M0
Nicky (Enhanced)enhanced
com.apple.ttsbundle.siri_Nicky_en-US_premium
F-1
Noelle (Enhanced)enhanced
com.apple.voice.enhanced.en-US.Noelle
F0

Per-Speaker Voice Mapping

  • SPEAKER_00/alu → Ava (Premium) — primary speaker, highest quality
  • SPEAKER_01 → Evan (Enhanced) — male voice for contrast
  • SPEAKER_02 → Allison (Enhanced)
  • SPEAKER_03 → Nathan (Enhanced)
  • SPEAKER_04 → Noelle (Enhanced)
  • SPEAKER_05 → Nicky (Enhanced)

MCP vs CLI Usage

  • Background script (
    /tmp/live-warehouse-stream.sh
    ): Uses CLI
    say -v "Voice Name"
    — works headless
  • In-session narration: Use
    mcp__macos-speech-sdk__speak
    with voice identifier for full control
  • mcp__say__speak
    has
    background: true
    param for non-blocking speech

Local Fallback (SDF Ch8 Degeneracy)

When remote pipeline is unreachable, use local mic capture:

/Users/alice/v/.venv-mlx-lm/bin/python /Users/alice/v/scripts/audio-capture-org.py
  • Captures MacBook Pro Microphone via FFmpeg avfoundation
    :1
  • Transcribes with mlx-whisper-small (16kHz, 8s chunks)
  • Appends to
    /Users/alice/v/audio-capture.org

DuckDB Integration

Ingest history for audio digest

duckdb -c ".read /Users/alice/v/live_history_pipeline.sql"
  • Merges claude/preclaude/codex history
  • Generates TTS-ready
    narration_line
    fields
  • audio_digest
    view: top 10 sessions formatted for voice

Audio ACSet database

  • /Users/alice/v/audio_acset.duckdb
    — Structured audio metadata
  • Tables: AudioFile, Transcript, Segment, Speaker, Topic, ACSetSchema

SDF Analysis

Per Software Design for Flexibility (Hanson & Sussman):

  • Ch1 Combinators: Pipeline = compose(ssh_tunnel, api_poll, tts_narrate)
  • Ch7 Propagators: Transcripts flow: mic → whisper → postgres → API → say (bidirectional: can query history backwards)
  • Ch8 Degeneracy: Remote warehouse (primary) vs local mic (fallback) — same generic interface, different implementations
  • Ch9 Generic Dispatch:
    narrate(source)
    dispatches on source type: warehouse API vs local org file

Dependency Structure

[USB Mics] ──USB──→ [conversation-logger]
                         │
                    [ALSA/PulseAudio]
                         │
                    [capture_node.py × 3]
                         │
                    [Whisper large-v3-turbo]
                         │
                    [PostgreSQL 16]
                         │
                    [Flask :5000]
                         │
                    [WiFi: TP-Link_A7B3] ← SINGLE POINT OF FAILURE
                         │
                    [LAN: 10.1.10.107]
                         │
            [gx10-acee: 100.67.53.87 via Tailscale]
                         │
                    [Local Mac: sshpass + curl]
                         │
                    [Say MCP / say command]

Risk: WiFi is the only network path. All ethernet ports show NO-CARRIER. Mitigation: USB mics and local capture/transcription continue even if WiFi drops — data accumulates locally and can be retrieved when connectivity returns.