A real-time voice assistant application powered by xAI's Grok Realtime API. Features a polished iPhone-inspired interface for natural voice conversations with Grok AI.
Grok Voice provides a seamless voice conversation experience with xAI's Grok model. The application establishes a WebSocket connection to the Grok Realtime API, enabling bidirectional audio streaming with low latency. The interface mimics an iPhone call screen, complete with a Dynamic Island indicator, call controls, and live transcription.
- Real-Time Voice Streaming — Bidirectional audio via WebSocket with server-side VAD (Voice Activity Detection)
- Live Transcription — Real-time speech-to-text for both user and assistant utterances
- iPhone-Style UI — Authentic iOS call interface with Dynamic Island, status bar, and glassmorphism effects
- Web & X Search — Grok can search the web and X (Twitter) for up-to-date information
- Call Controls — Mute microphone and toggle speaker output
- Customizable Personality — Configure Grok's voice and behavior via environment variables
- Responsive Design — Optimized for desktop and mobile viewports
- Node.js 18.x or later
- xAI API Key — Obtain from x.ai
- Modern Browser — Chrome, Firefox, Safari, or Edge with WebRTC support
git clone https://github.com/your-username/iphone-voice-site.git
cd iphone-voice-sitenpm installCreate a .env.local file in the project root:
# Required
XAI_API_KEY=your_xai_api_key_here
# Optional: Voice selection (default: ara)
XAI_VOICE=ara
# Optional: Custom system prompt
XAI_INSTRUCTIONS=You are a helpful voice assistant named Grok. Keep your responses concise and conversational since this is a voice call. Be friendly and engaging.npm run devNavigate to http://localhost:3000 and grant microphone access when prompted.
iphone-voice-site/
├── src/
│ └── app/
│ ├── api/
│ │ ├── session/
│ │ │ └── route.ts # Ephemeral token generation
│ │ └── voice/
│ │ └── route.ts # Voice configuration endpoint
│ ├── globals.css # Global styles & animations
│ ├── layout.tsx # Root layout with fonts
│ └── page.tsx # Main voice interface
├── public/ # Static assets
├── .env.local # Environment variables (create this)
├── package.json
├── tailwind.config.ts
└── tsconfig.json
- Microphone Capture — Web Audio API with
AudioWorkletfor low-latency PCM capture - Audio Processing — Real-time conversion to PCM16 format at native sample rate
- WebSocket Transport — Base64-encoded audio chunks sent to Grok Realtime API
- Playback — Incoming audio queued and played via
AudioBufferSourceNode
| Endpoint | Method | Description |
|---|---|---|
/api/session |
POST | Generates ephemeral client secrets for secure WebSocket authentication |
/api/voice |
GET | Returns service configuration and status |
The application handles the following Grok Realtime API events:
| Event | Description |
|---|---|
conversation.created |
Session initialized, sends configuration |
session.updated |
Configuration confirmed, starts audio capture |
response.output_audio.delta |
Incoming audio chunk from Grok |
response.output_audio_transcript.delta |
Assistant transcription update |
conversation.item.input_audio_transcription.completed |
User transcription complete |
input_audio_buffer.speech_started |
VAD detected speech start |
input_audio_buffer.speech_stopped |
VAD detected speech end |
| Variable | Required | Default | Description |
|---|---|---|---|
XAI_API_KEY |
✅ Yes | — | Your xAI API key for authentication |
XAI_VOICE |
No | ara |
Voice model to use for responses |
XAI_INSTRUCTIONS |
No | (see below) | System prompt defining Grok's personality |
Default Instructions:
You are a helpful voice assistant named Grok. Keep your responses concise and conversational since this is a voice call. Be friendly and engaging.
| Technology | Version | Purpose |
|---|---|---|
| Next.js | 16.x | React framework with App Router |
| React | 19.x | UI library |
| Tailwind CSS | 4.x | Utility-first styling |
| TypeScript | 5.x | Type safety |
| xAI Realtime API | — | Voice AI via WebSocket |
| Web Audio API | — | Audio capture & playback |
npm run build
vercel deployNote: Vercel Serverless Functions do not support persistent WebSocket connections. The current implementation uses client-side WebSocket connections directly to the xAI API with ephemeral tokens, which works on Vercel.
The application can be deployed to any Node.js hosting platform:
- Railway
- Render
- Fly.io
- AWS / GCP / Azure
| Command | Description |
|---|---|
npm run dev |
Start development server with hot reload |
npm run build |
Build production bundle |
npm run start |
Start production server |
npm run lint |
Run ESLint |
| Browser | Support |
|---|---|
| Chrome 90+ | ✅ Full |
| Firefox 90+ | ✅ Full |
| Safari 15+ | ✅ Full |
| Edge 90+ | ✅ Full |
Requires getUserMedia and AudioWorklet support.
This project is licensed under the MIT License.