Grok Voice

A real-time voice assistant application powered by xAI's Grok Realtime API. Features a polished iPhone-inspired interface for natural voice conversations with Grok AI.

Overview

Grok Voice provides a seamless voice conversation experience with xAI's Grok model. The application establishes a WebSocket connection to the Grok Realtime API, enabling bidirectional audio streaming with low latency. The interface mimics an iPhone call screen, complete with a Dynamic Island indicator, call controls, and live transcription.

Key Features

Real-Time Voice Streaming — Bidirectional audio via WebSocket with server-side VAD (Voice Activity Detection)
Live Transcription — Real-time speech-to-text for both user and assistant utterances
iPhone-Style UI — Authentic iOS call interface with Dynamic Island, status bar, and glassmorphism effects
Web & X Search — Grok can search the web and X (Twitter) for up-to-date information
Call Controls — Mute microphone and toggle speaker output
Customizable Personality — Configure Grok's voice and behavior via environment variables
Responsive Design — Optimized for desktop and mobile viewports

Prerequisites

Node.js 18.x or later
xAI API Key — Obtain from x.ai
Modern Browser — Chrome, Firefox, Safari, or Edge with WebRTC support

Installation

1. Clone the Repository

git clone https://github.com/your-username/iphone-voice-site.git
cd iphone-voice-site

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env.local file in the project root:

# Required
XAI_API_KEY=your_xai_api_key_here

# Optional: Voice selection (default: ara)
XAI_VOICE=ara

# Optional: Custom system prompt
XAI_INSTRUCTIONS=You are a helpful voice assistant named Grok. Keep your responses concise and conversational since this is a voice call. Be friendly and engaging.

4. Start the Development Server

npm run dev

5. Open the Application

Navigate to http://localhost:3000 and grant microphone access when prompted.

Project Structure

iphone-voice-site/
├── src/
│   └── app/
│       ├── api/
│       │   ├── session/
│       │   │   └── route.ts      # Ephemeral token generation
│       │   └── voice/
│       │       └── route.ts      # Voice configuration endpoint
│       ├── globals.css           # Global styles & animations
│       ├── layout.tsx            # Root layout with fonts
│       └── page.tsx              # Main voice interface
├── public/                       # Static assets
├── .env.local                    # Environment variables (create this)
├── package.json
├── tailwind.config.ts
└── tsconfig.json

Architecture

Client-Side Audio Pipeline

Microphone Capture — Web Audio API with AudioWorklet for low-latency PCM capture
Audio Processing — Real-time conversion to PCM16 format at native sample rate
WebSocket Transport — Base64-encoded audio chunks sent to Grok Realtime API
Playback — Incoming audio queued and played via AudioBufferSourceNode

Server-Side Components

Endpoint	Method	Description
`/api/session`	POST	Generates ephemeral client secrets for secure WebSocket authentication
`/api/voice`	GET	Returns service configuration and status

WebSocket Events

The application handles the following Grok Realtime API events:

Event	Description
`conversation.created`	Session initialized, sends configuration
`session.updated`	Configuration confirmed, starts audio capture
`response.output_audio.delta`	Incoming audio chunk from Grok
`response.output_audio_transcript.delta`	Assistant transcription update
`conversation.item.input_audio_transcription.completed`	User transcription complete
`input_audio_buffer.speech_started`	VAD detected speech start
`input_audio_buffer.speech_stopped`	VAD detected speech end

Environment Variables

Variable	Required	Default	Description
`XAI_API_KEY`	✅ Yes	—	Your xAI API key for authentication
`XAI_VOICE`	No	`ara`	Voice model to use for responses
`XAI_INSTRUCTIONS`	No	(see below)	System prompt defining Grok's personality

Default Instructions:

You are a helpful voice assistant named Grok. Keep your responses concise and conversational since this is a voice call. Be friendly and engaging.

Tech Stack

Technology	Version	Purpose
Next.js	16.x	React framework with App Router
React	19.x	UI library
Tailwind CSS	4.x	Utility-first styling
TypeScript	5.x	Type safety
xAI Realtime API	—	Voice AI via WebSocket
Web Audio API	—	Audio capture & playback

Deployment

Vercel (Recommended for UI)

npm run build
vercel deploy

Note: Vercel Serverless Functions do not support persistent WebSocket connections. The current implementation uses client-side WebSocket connections directly to the xAI API with ephemeral tokens, which works on Vercel.

Other Platforms

The application can be deployed to any Node.js hosting platform:

Railway
Render
Fly.io
AWS / GCP / Azure

Scripts

Command	Description
`npm run dev`	Start development server with hot reload
`npm run build`	Build production bundle
`npm run start`	Start production server
`npm run lint`	Run ESLint

Browser Support

Browser	Support
Chrome 90+	✅ Full
Firefox 90+	✅ Full
Safari 15+	✅ Full
Edge 90+	✅ Full

Requires getUserMedia and AudioWorklet support.

License

This project is licensed under the MIT License.

Acknowledgments

xAI for the Grok Realtime API
Vercel for Next.js and hosting infrastructure

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
public		public
src/app		src/app
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grok Voice

Overview

Key Features

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Start the Development Server

5. Open the Application

Project Structure

Architecture

Client-Side Audio Pipeline

Server-Side Components

WebSocket Events

Environment Variables

Tech Stack

Deployment

Vercel (Recommended for UI)

Other Platforms

Scripts

Browser Support

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Grok Voice

Overview

Key Features

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Start the Development Server

5. Open the Application

Project Structure

Architecture

Client-Side Audio Pipeline

Server-Side Components

WebSocket Events

Environment Variables

Tech Stack

Deployment

Vercel (Recommended for UI)

Other Platforms

Scripts

Browser Support

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages