ADK Gemini Live API Toolkit Demo

A working demonstration of real-time bidirectional streaming with Google's Agent Development Kit (ADK). This FastAPI application showcases WebSocket-based communication with Gemini models, supporting multimodal requests (text, audio, and image/video input) and flexible responses (text or audio output).

$bidi-demo-screen$

Overview

This demo implements the complete ADK bidirectional streaming lifecycle:

Application Initialization: Creates Agent, SessionService, and Runner at startup
Session Initialization: Establishes Session, RunConfig, and LiveRequestQueue per connection
Bidirectional Streaming: Concurrent upstream (client → queue) and downstream (events → client) tasks
Graceful Termination: Proper cleanup of LiveRequestQueue and WebSocket connections

Features

WebSocket Communication: Real-time bidirectional streaming via /ws/{user_id}/{session_id}
Multimodal Requests: Text, audio, and image/video input with automatic audio transcription
Flexible Responses: Text or audio output, automatically determined based on model architecture
Session Resumption: Reconnection support configured via RunConfig
Concurrent Tasks: Separate upstream/downstream async tasks for optimal performance
Interactive UI: Web interface with event console for monitoring Live API events
Math tutor agent: System instructions and prompts for tutoring; optional Share Screen with intent-based screen capture

Architecture

The application follows ADK's recommended concurrent task pattern:

┌─────────────┐         ┌──────────────────┐         ┌─────────────┐
│             │         │                  │         │             │
│  WebSocket  │────────▶│ LiveRequestQueue │────────▶│  Live API   │
│   Client    │         │                  │         │   Session   │
│             │◀────────│   run_live()     │◀────────│             │
└─────────────┘         └──────────────────┘         └─────────────┘
  Upstream Task              Queue              Downstream Task

Upstream Task: Receives WebSocket messages and forwards to LiveRequestQueue
Downstream Task: Processes run_live() events and sends to WebSocket client

Prerequisites

Python 3.10 or higher
uv (recommended) or pip
Google API key (for Gemini Live API) or Google Cloud project (for Vertex AI Live API)

Installing uv (if not already installed):

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Installation

1. Navigate to Demo Directory

cd src/bidi-demo

2. Install Dependencies

Using uv (recommended):

uv sync

This automatically creates a virtual environment, installs all dependencies, and generates a lock file for reproducible builds.

Using pip (alternative):

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

3. Configure Environment Variables

Create or edit app/.env with your credentials:

# Choose your Live API platform
GOOGLE_GENAI_USE_VERTEXAI=FALSE

# For Gemini Live API (when GOOGLE_GENAI_USE_VERTEXAI=FALSE)
GOOGLE_API_KEY=your_api_key_here

# For Vertex AI Live API (when GOOGLE_GENAI_USE_VERTEXAI=TRUE)
# GOOGLE_CLOUD_PROJECT=your_project_id
# GOOGLE_CLOUD_LOCATION=us-central1

# Model selection (optional, defaults to native audio model)
# See "Supported Models" section below for available model names
DEMO_AGENT_MODEL=gemini-2.5-flash-native-audio-preview-12-2025

Getting API Credentials

Gemini Live API:

Visit Google AI Studio
Create an API key
Set GOOGLE_API_KEY in .env

Vertex AI Live API:

Enable Vertex AI API in Google Cloud Console
Set up authentication: gcloud auth application-default login
In app/.env: set GOOGLE_GENAI_USE_VERTEXAI=TRUE, GOOGLE_CLOUD_PROJECT, and GOOGLE_CLOUD_LOCATION
Do not set GOOGLE_API_KEY when using Vertex (the app ignores it so the SDK uses Application Default Credentials)

4. Set SSL Certificate Path

Set the SSL certificate file path for secure connections:

# If using uv
export SSL_CERT_FILE=$(uv run python -m certifi)

# If using pip with activated venv
export SSL_CERT_FILE=$(python -m certifi)

Running the Demo

Start the Server

From the src/bidi-demo directory, first change to the app subdirectory:

cd app

Note: You must run from inside the app directory so Python can find the math_tutor_agent module. Running from the parent directory will fail with ModuleNotFoundError: No module named 'math_tutor_agent'.

Using uv (recommended):

uv run --project .. uvicorn main:app --reload --host 0.0.0.0 --port 8000

Using pip (with activated venv):

uvicorn main:app --reload --host 0.0.0.0 --port 8000

The --reload flag enables auto-restart on code changes during development.

Background Mode (Testing/Production)

To run in background with log output:

# Using uv (from app directory)
uv run --project .. uvicorn main:app --host 0.0.0.0 --port 8000 > server.log 2>&1 &

# Using pip (from app directory)
uvicorn main:app --host 0.0.0.0 --port 8000 > server.log 2>&1 &

To check the server log:

tail -f server.log  # Follow log in real-time

To stop the background server:

kill $(lsof -ti:8000)

Access the Application

Open your browser and navigate to:

http://localhost:8000

Usage

Text Mode

Type your message in the input field
Click "Send" or press Enter
Watch the event console for Live API events
Receive streamed responses in real-time

Audio Mode

Click "Start Audio" to begin voice interaction
Speak into your microphone
Receive audio responses with real-time transcription
Click "Stop Audio" to end the audio session

Share Screen

Click "Share Screen" and choose a window or screen to share
When you speak and the model detects that you want it to see your screen (e.g. "look at my screen", "what do you see?"), a screenshot is captured and sent with your transcript
The math tutor can then use the visual context to help you (e.g. with a problem on screen)
Click "Stop Share" to stop sharing

WebSocket API

Endpoint

ws://localhost:8000/ws/{user_id}/{session_id}

Path Parameters:

user_id: Unique identifier for the user
session_id: Unique identifier for the session

Response Modality:

Automatically determined based on model architecture
Native audio models use AUDIO response modality
Half-cascade models use TEXT response modality

Message Format

Client → Server (Text):

{
  "type": "text",
  "text": "Your message here"
}

Client → Server (Image):

{
  "type": "image",
  "data": "base64_encoded_image_data",
  "mimeType": "image/jpeg"
}

Client → Server (Screen capture, with optional transcript):

{
  "type": "screen_capture",
  "data": "base64_encoded_image_data",
  "mimeType": "image/jpeg",
  "text": "User transcript (optional)"
}

Client → Server (Audio):

Send raw binary frames (PCM audio, 16kHz, 16-bit)

Server → Client:

JSON-encoded ADK Event objects
See ADK Events Documentation for event schemas

Project Structure

bidi-demo/
├── app/
│   ├── math_tutor_agent/        # Agent definition module
│   │   ├── __init__.py          # Package exports
│   │   ├── agent.py             # Agent configuration
│   │   ├── prompts.py            # System instructions for math tutor
│   │   └── tools.py             # capture_visual_context, cleanup_visual_context
│   ├── main.py                  # FastAPI application and WebSocket endpoint
│   ├── .env                     # Environment configuration (not in git)
│   └── static/                  # Frontend files
│       ├── index.html           # Main UI
│       ├── css/
│       │   └── style.css        # Styling
│       └── js/
│           ├── app.js                   # Main application logic
│           ├── audio-player.js          # Audio playback
│           ├── audio-recorder.js        # Audio recording
│           ├── pcm-player-processor.js  # Audio processing
│           └── pcm-recorder-processor.js # Audio processing
├── tests/                       # E2E tests and test logs
├── pyproject.toml               # Python project configuration
└── README.md                    # This file

Code Overview

Agent Definition (app/math_tutor_agent/agent.py)

The agent is defined in a separate module following ADK best practices, with system instructions in prompts.py:

from .prompts import INSTRUCTION

agent = Agent(
    name="math_tutor_agent",
    model=os.getenv("DEMO_AGENT_MODEL", "gemini-2.5-flash-native-audio-preview-12-2025"),
    tools=[],
    instruction=INSTRUCTION,
)

Application Initialization (app/main.py)

from math_tutor_agent.agent import agent

app = FastAPI()
session_service = InMemorySessionService()
runner = Runner(app_name="bidi-demo", agent=agent, session_service=session_service)

WebSocket Handler (app/main.py:65-209)

The WebSocket endpoint implements the complete bidirectional streaming pattern:

Accept Connection: Establish WebSocket connection
Configure Session: Create RunConfig with automatic modality detection
Initialize Queue: Create LiveRequestQueue for message passing
Start Concurrent Tasks: Launch upstream and downstream tasks
Handle Cleanup: Close queue in finally block

Concurrent Tasks

Upstream Task (app/main.py:125-172):

Receives WebSocket messages (text, image, or audio binary)
Converts to ADK format (Content or Blob)
Sends to LiveRequestQueue via send_content() or send_realtime()

Downstream Task (app/main.py:174-187):

Calls runner.run_live() with queue and config
Receives Event stream from Live API
Serializes events to JSON and sends to WebSocket

Configuration

Supported Models

The demo supports any Gemini model compatible with Live API:

Native Audio Models (recommended for voice):

gemini-2.5-flash-native-audio-preview-12-2025 (Gemini Live API)
gemini-live-2.5-flash-native-audio (Vertex AI)

Set the model via DEMO_AGENT_MODEL in .env or modify app/math_tutor_agent/agent.py.

For the latest model availability and features:

Gemini Live API: Check the official Gemini API models documentation
Vertex AI Live API: Check the official Vertex AI models documentation

RunConfig Options

The demo automatically configures bidirectional streaming based on model architecture (app/main.py:76-104):

For Native Audio Models (containing "native-audio" in model name):

run_config = RunConfig(
    streaming_mode=StreamingMode.BIDI,
    response_modalities=["AUDIO"],
    input_audio_transcription=types.AudioTranscriptionConfig(),
    output_audio_transcription=types.AudioTranscriptionConfig(),
    session_resumption=types.SessionResumptionConfig()
)

For Half-Cascade Models (other models):

run_config = RunConfig(
    streaming_mode=StreamingMode.BIDI,
    response_modalities=["TEXT"],
    input_audio_transcription=None,
    output_audio_transcription=None,
    session_resumption=types.SessionResumptionConfig()
)

The modality detection is automatic based on the model name. Native audio models use AUDIO response modality with transcription enabled, while half-cascade models use TEXT response modality for better performance.

Troubleshooting

Connection Issues

Problem: WebSocket fails to connect

Solutions:

Verify API credentials in app/.env
Check console for error messages
Ensure uvicorn is running on correct port

Audio Not Working

Problem: Audio input/output not functioning

Solutions:

Grant microphone permissions in browser
Verify browser supports Web Audio API
Check that audio model is configured (native audio model required)
Review browser console for errors

Model Errors

Problem: "Model not found" or quota errors

Solutions:

Verify model name matches your platform (Gemini vs Vertex AI)
Check API quota limits in console
Ensure billing is enabled (for Vertex AI)

Development

Code Formatting

This project uses black, isort, and flake8 for code formatting and linting. Configuration is inherited from the repository root.

Using uv:

uv run black .
uv run isort .
uv run flake8 .

Using pip (with activated venv):

black .
isort .
flake8 .

To check formatting without making changes:

# Using uv
uv run black --check .
uv run isort --check .

# Using pip
black --check .
isort --check .

Additional Resources

ADK Documentation: https://google.github.io/adk-docs/
Gemini Live API: https://ai.google.dev/gemini-api/docs/live
Vertex AI Live API: https://cloud.google.com/vertex-ai/generative-ai/docs/live-api
ADK GitHub Repository: https://github.com/google/adk-python

License

Apache 2.0 - See repository LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
assets		assets
excalidraw_api		excalidraw_api
tests		tests
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

ADK Gemini Live API Toolkit Demo

Overview

Features

Architecture

Prerequisites

Installation

1. Navigate to Demo Directory

2. Install Dependencies

3. Configure Environment Variables

Getting API Credentials

4. Set SSL Certificate Path

Running the Demo

Start the Server

Background Mode (Testing/Production)

Access the Application

Usage

Text Mode

Audio Mode

Share Screen

WebSocket API

Endpoint

Message Format

Project Structure

Code Overview

Agent Definition (app/math_tutor_agent/agent.py)

Application Initialization (app/main.py)

WebSocket Handler (app/main.py:65-209)

Concurrent Tasks

Configuration

Supported Models

RunConfig Options

Troubleshooting

Connection Issues

Audio Not Working

Model Errors

Development

Code Formatting

Additional Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages