Customer-service-assistant voice-assistant-platform
Multi-tenant, callable voice assistant platform for company-specific information.
install
source · Clone the upstream repo
git clone https://github.com/papdawin/customer-service-assistant
manifest:
skill.mdsource content
Voice Assistant Platform
Purpose
Build and operate a callable, company-facing voice assistant that answers questions about a business (location, opening hours, contact methods, reachability, services, and policies). The platform is designed for multi-tenant deployments: each company maintains its own knowledge base, while core speech and language services are shared to keep operations efficient.
What This System Does
- Accepts live or recorded audio from callers.
- Detects speech segments to avoid sending silence and noise downstream.
- Transcribes speech into text with STT.
- Retrieves company-specific knowledge with RAG.
- Generates a concise, accurate answer with the shared LLM.
- Synthesizes spoken replies with TTS.
- Returns audio and text results with timing data for monitoring and UX feedback.
Multi-Tenant Model
- Each company is a tenant with its own data and retrieval index.
- The RAG service is instantiated per tenant and points at tenant data sources.
- STT, TTS, VAD, and the LLM are shared services across all tenants.
- The backend gateway routes requests to the correct tenant RAG based on deployment config.
Core Services
- RAG: Per-tenant retrieval service. Companies can update their own information without changing core services.
- STT: Shared speech-to-text service (Whisper). Converts audio to text.
- TTS: Shared text-to-speech service (Piper). Converts responses into audio.
- VAD: Shared voice activity detection. Identifies speech segments to improve accuracy and efficiency.
- Backend: Orchestrates the pipeline and exposes HTTP + WebSocket APIs.
- Frontend: Serves the UI for testing or operational use.
Typical Data Sources Per Company
- Location and address details
- Opening hours and holiday schedules
- Contact and reachability information
- Services offered and pricing/availability
- FAQ and policy documents
Operational Goals
- Consistent responses across multiple companies with tenant-specific accuracy.
- Low-latency speech pipeline with observable timings.
- Easy onboarding of new companies by providing their data to RAG.
- Shared infrastructure for compute-heavy services to reduce cost.
Main Components
- End-to-end audio pipeline: VAD -> STT -> RAG -> LLM -> TTS.
- Tenant-specific indices and retrieval settings.
- Standardized APIs for health, config, and inference.
- Streaming support for live voice interactions.