Customer-service-assistant voice-assistant-platform

Multi-tenant, callable voice assistant platform for company-specific information.

install

source · Clone the upstream repo

git clone https://github.com/papdawin/customer-service-assistant

manifest: skill.md

source content

Voice Assistant Platform

Purpose

Build and operate a callable, company-facing voice assistant that answers questions about a business (location, opening hours, contact methods, reachability, services, and policies). The platform is designed for multi-tenant deployments: each company maintains its own knowledge base, while core speech and language services are shared to keep operations efficient.

What This System Does

Accepts live or recorded audio from callers.
Detects speech segments to avoid sending silence and noise downstream.
Transcribes speech into text with STT.
Retrieves company-specific knowledge with RAG.
Generates a concise, accurate answer with the shared LLM.
Synthesizes spoken replies with TTS.
Returns audio and text results with timing data for monitoring and UX feedback.

Multi-Tenant Model

Each company is a tenant with its own data and retrieval index.
The RAG service is instantiated per tenant and points at tenant data sources.
STT, TTS, VAD, and the LLM are shared services across all tenants.
The backend gateway routes requests to the correct tenant RAG based on deployment config.

Core Services

RAG: Per-tenant retrieval service. Companies can update their own information without changing core services.
STT: Shared speech-to-text service (Whisper). Converts audio to text.
TTS: Shared text-to-speech service (Piper). Converts responses into audio.
VAD: Shared voice activity detection. Identifies speech segments to improve accuracy and efficiency.
Backend: Orchestrates the pipeline and exposes HTTP + WebSocket APIs.
Frontend: Serves the UI for testing or operational use.

Typical Data Sources Per Company

Location and address details
Opening hours and holiday schedules
Contact and reachability information
Services offered and pricing/availability
FAQ and policy documents

Operational Goals

Consistent responses across multiple companies with tenant-specific accuracy.
Low-latency speech pipeline with observable timings.
Easy onboarding of new companies by providing their data to RAG.
Shared infrastructure for compute-heavy services to reduce cost.

Main Components

End-to-end audio pipeline: VAD -> STT -> RAG -> LLM -> TTS.
Tenant-specific indices and retrieval settings.
Standardized APIs for health, config, and inference.
Streaming support for live voice interactions.