Skip to content

DIZ-admin/LocalScribe

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Real-Time Audio Processing API

A FastAPI-based real-time audio processing system for live customer calls that performs speaker diarization, transcription, and sentiment analysis.

Features

  • Real-time microphone input capture
  • Speaker diarization (agent vs customer)
  • Live speech-to-text transcription
  • Sentiment analysis (both text and voice)
  • Voice feature extraction (pitch, energy, speaking rate)
  • WebSocket-based real-time output streaming

Project Structure

.
β”œβ”€β”€ api/                    # FastAPI routes and WebSocket handlers
β”œβ”€β”€ capture/               # Audio capture module
β”œβ”€β”€ diarization/          # Speaker diarization module
β”œβ”€β”€ transcription/        # Speech-to-text module
β”œβ”€β”€ sentiment/            # Sentiment analysis module
β”œβ”€β”€ tests/                # Unit tests
β”œβ”€β”€ main.py              # FastAPI application entry point
β”œβ”€β”€ requirements.txt     # Project dependencies
└── README.md           # This file

Setup

  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
uvicorn main:app --reload

API Endpoints

  • POST /api/start: Start the audio processing pipeline
  • POST /api/stop: Stop the audio processing pipeline
  • GET /api/status: Get current pipeline status
  • WS /ws: WebSocket endpoint for real-time output

Testing

Run the test suite:

pytest

Real-Time Output Format

The system outputs results in real-time with the following format:

12:01:23 | SPEAKER_00 | Hello, how can I help you today?
⚠️ [12:01:23] NATURAL (score=0.92)
   Text: NATURAL, Voice: NATURAL
   Voice features - Pitch: 185.23, Energy: 0.12, Rate: 0.08

Requirements

  • Python 3.8+
  • Working microphone
  • Sufficient CPU/GPU for real-time processing
  • Internet connection for model downloads

License

MIT

About

🎀 Real-Time Audio Intelligence | Live speaker identification, speech-to-text, and sentiment analysis pipeline with Whisper & Wav2Vec2. Perfect for meetings, support calls, and voice apps. ⚑ Features: Multi-speaker diarization β€’ Emotion detection β€’ Low-latency processing β€’ Python/FastAPI backend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%