A minimal, production-ready Voice AI Agent built with Pipecat that wires up Google Cloud STT + Gemini LLM + Google Cloud TTS into a conversational voice pipeline — over both WebRTC (browser) and Exotel telephony (phone calls).
- Speech-to-Text: Google Cloud STT (Chirp 3) for accurate real-time transcription
- LLM: Google Gemini 2.5 Flash Lite via Vertex AI for fast, conversational responses
- Text-to-Speech: Google Cloud TTS (Chirp3-HD) for natural-sounding voice output
- WebRTC: Talk to the agent directly from your browser
- Exotel Telephony: Receive phone calls and converse via Exotel's WebSocket streaming
- Voice Activity Detection: Silero VAD for natural turn-taking (200ms stop threshold)
- Metrics: Built-in pipeline and usage metrics
┌──────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐ ┌──────────┐
│ Audio │────>│ Google │────>│ Gemini │────>│ Google │────>│ Audio │
│ Input │ │ STT │ │ LLM │ │ TTS │ │ Output │
└──────────┘ └─────────┘ └───────────┘ └─────────┘ └──────────┘
WebRTC Chirp 3 Flash Lite 2.5 Chirp3-HD WebRTC
or Exotel (Vertex AI) or Exotel
voicebot-quick-starter/
├── bot.py # Core pipeline: STT → LLM → TTS
├── default_runner.py # WebRTC runner — browser-based voice chat
├── exotel_runner.py # Exotel runner — telephony via WebSocket
├── requirements.txt # Python dependencies
├── .env # Environment variables (not committed)
├── .gitignore # Git ignore rules
└── README.md # This file
- Python 3.10+
- A Google Cloud project with the following APIs enabled:
- Cloud Speech-to-Text API
- Vertex AI API
- Cloud Text-to-Speech API
- A GCP service account JSON key file
-
Clone the repository:
git clone https://github.com/exotel/voicebot-quick-starter.git cd voicebot-quick-starter -
Set up Python environment with micromamba:
Install micromamba (one-time setup):
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
The installer will ask a few questions — accept the defaults:
Prompt What to enter Micromamba binary folder? [~/.local/bin]Press Enter Init shell (bash)? [Y/n]Type Y, press Enter Configure conda-forge? [Y/n]Type Y, press Enter Prefix location? [~/micromamba]Press Enter Then reload your shell and create the environment:
source ~/.bashrc micromamba create -n dev_conf python=3.12 -c conda-forge micromamba activate dev_conf
-
Install dependencies:
Run from the
voicebot-quick-starter/directory:pip install -r requirements.txt
-
Configure environment variables:
Create a
.envfile inside thevoicebot-quick-starter/directory:# Google Cloud credentials (STT, LLM, TTS) GOOGLE_APPLICATION_CREDENTIALS=./your-service-account.jsonPlace your GCP service account JSON file in the same
voicebot-quick-starter/directory and update the path accordingly.
Run from the voicebot-quick-starter/ directory:
python default_runner.pyOpen http://localhost:7860/client in your browser and click Connect.
Running the Exotel runner locally requires a public URL so that Exotel can reach your machine. Set up ngrok first — see ngrok Setup below.
Once ngrok is running, start the agent from the voicebot-quick-starter/ directory:
python exotel_runner.pyThe agent starts a WebSocket server ready to accept Exotel voice streams at 8kHz.
To place a call that connects your phone to the agent, use the Exotel Connect API:
curl -k -X POST \
'https://<API_KEY>:<API_TOKEN>@api-stream.exotel.com/v1/Accounts/<ACCOUNT_SID>/Calls/connect.json' \
-F 'StreamType=bidirectional' \
-F 'StreamUrl=<NGROK_PUBLIC_URL>/ws' \
-F 'From=<YOUR_PHONE_NUMBER>' \
-F 'CallerId=<YOUR_EXOPHONE>' \
-F 'Record=true'Replace the placeholders:
| Placeholder | Description |
|---|---|
<API_KEY>:<API_TOKEN> |
Your Exotel API credentials (found in the Exotel dashboard) |
<ACCOUNT_SID> |
Your Exotel account SID |
<NGROK_PUBLIC_URL> |
The public URL from ngrok (e.g., wss://abcd1234.ngrok-free.app) |
<YOUR_PHONE_NUMBER> |
The phone number that will receive the call |
<YOUR_EXOPHONE> |
The Exophone (virtual number) you purchased from Exotel |
You'll receive a call on your phone — once you pick up, you'll be talking to the agent.
ngrok exposes your local server to the internet, which is required for the Exotel runner to receive incoming voice streams.
-
Install ngrok by following the steps at https://ngrok.com/download
-
Add your authtoken (sign up for a free account at ngrok.com if you don't have one):
ngrok config add-authtoken <your-authtoken>
-
Expose your local server:
ngrok http 7860
-
Copy the public
https://URL from the ngrok output — use this as the<NGROK_PUBLIC_URL>in the Exotel curl command above (replacehttps://withwss://and append/ws).
- Create a GCP project and enable the required APIs:
- Cloud Speech-to-Text API
- Vertex AI API
- Cloud Text-to-Speech API
- Create a service account with the following roles:
roles/speech.client— Speech-to-Textroles/aiplatform.user— Vertex AI (Gemini LLM)
- Download the service account JSON key
- Place the JSON file inside the
voicebot-quick-starter/directory - Set
GOOGLE_APPLICATION_CREDENTIALSinvoicebot-quick-starter/.envto point to it
Edit voicebot-quick-starter/bot.py to customize:
- System prompt — Change the agent's personality and behavior
- STT region — Currently set to
asia-south1(Mumbai) - STT model — Currently using
chirp_3 - LLM model — Currently using
gemini-2.5-flash-lite; swap for other Gemini models as needed - TTS voice — Currently using
en-US-Chirp3-HD-Charon; see available voices - VAD sensitivity — Adjust
stop_secsinVADParamsfor turn-taking timing
Pipecat supports a wide range of providers. You can replace any component in the pipeline by swapping the service class in bot.py. Refer to the examples in the Pipecat repo for guidance:
| Component | Supported Providers | Examples |
|---|---|---|
| STT | Deepgram, AssemblyAI, Whisper, Azure, Google | STT examples |
| LLM | OpenAI, Anthropic, Google Gemini, Azure, Groq | LLM examples |
| TTS | Cartesia, ElevenLabs, PlayHT, Deepgram, Google, Azure | TTS examples |
For the full list of supported services, see the Pipecat services documentation.
Deploy to any cloud provider (GCP, AWS, etc.) and ensure:
- WebSocket connections are supported by your load balancer
- The GCP service account JSON is securely mounted (not baked into the image)
- Environment variables are injected via a secrets manager
| Component | Technology |
|---|---|
| Framework | Pipecat v0.0.101 |
| STT | Google Cloud Speech-to-Text (Chirp 3) |
| LLM | Google Gemini 2.5 Flash Lite (Vertex AI) |
| TTS | Google Cloud Text-to-Speech (Chirp3-HD) |
| VAD | Silero VAD |
| WebRTC | Pipecat SmallWebRTC |
| Telephony | Exotel WebSocket Streaming |
- Ensure your browser allows microphone access (WebRTC mode)
- Check that
GOOGLE_APPLICATION_CREDENTIALSinvoicebot-quick-starter/.envpoints to a valid service account JSON - Review the terminal logs for
ErrorFramemessages
- Adjust
stop_secsin VAD params — lower values mean faster turn-taking but may cut off speech
- Verify ngrok is running and the WebSocket URL is publicly accessible (see ngrok Setup)
MIT
- Pipecat — Open-source framework for voice and multimodal AI agents
- Exotel — Cloud telephony and voice streaming
- Google Cloud — STT, LLM, and TTS services