A FastAPI-based real-time audio processing system for live customer calls that performs speaker diarization, transcription, and sentiment analysis.
- Real-time microphone input capture
- Speaker diarization (agent vs customer)
- Live speech-to-text transcription
- Sentiment analysis (both text and voice)
- Voice feature extraction (pitch, energy, speaking rate)
- WebSocket-based real-time output streaming
.
βββ api/ # FastAPI routes and WebSocket handlers
βββ capture/ # Audio capture module
βββ diarization/ # Speaker diarization module
βββ transcription/ # Speech-to-text module
βββ sentiment/ # Sentiment analysis module
βββ tests/ # Unit tests
βββ main.py # FastAPI application entry point
βββ requirements.txt # Project dependencies
βββ README.md # This file
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Run the application:
uvicorn main:app --reloadPOST /api/start: Start the audio processing pipelinePOST /api/stop: Stop the audio processing pipelineGET /api/status: Get current pipeline statusWS /ws: WebSocket endpoint for real-time output
Run the test suite:
pytestThe system outputs results in real-time with the following format:
12:01:23 | SPEAKER_00 | Hello, how can I help you today?
β οΈ [12:01:23] NATURAL (score=0.92)
Text: NATURAL, Voice: NATURAL
Voice features - Pitch: 185.23, Energy: 0.12, Rate: 0.08
- Python 3.8+
- Working microphone
- Sufficient CPU/GPU for real-time processing
- Internet connection for model downloads
MIT