An AI-powered video processing platform that allows users to upload instructional videos and ask questions about them using natural language. Features multimodal search across video content, frame analysis, and intelligent chat interface.
- Video Upload: Drag-and-drop interface with real-time processing status
- AI Chat Interface: Ask questions about uploaded videos with streaming responses
- Multimodal Search: Search across transcripts, frames, and uploaded images
- Frame Analysis: Automatic scene detection and frame extraction with AI vision
- Authentication: Simple demo authentication system
- Real-time Processing: Live status updates during video processing
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Browser βββββββΆβ Next.js App βββββββΆβ PostgreSQL β
β β β (video-qa) β β + pgvector β
βββββββββββββββ ββββββββββββββββ ββββββββ¬βββββββ
β β
β writes β polls
β video + β jobs
β job β
βΌ βΌ
βββββββββββββββ ββββββββββββββββ
β data/ β β Worker β
β uploads/ ββββββββ (video- β
β processed/ β β worker) β
β frames/ β ββββββββββββββββ
βββββββββββββββ
- Node.js 18+ and pnpm
- Docker and Docker Compose
- OpenAI API key
# Clone both repositories
git clone <video-qa-repo>
git clone <video-worker-repo>
# Create environment file
cd video-qa
cat > .env.local << EOF
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/videoqa
OPENAI_KEY=your_openai_api_key_here
EOF# Build worker image
cd ../video-qa-worker
docker build -t videoqa-worker:0.0.19 .
# Start database and worker
cd ../video-qa
docker-compose up -d
# Start Next.js app
pnpm install
pnpm dev- Open http://localhost:3000
- Login with demo credentials:
demo/demo123 - Upload a video file (max 500MB)
- Monitor processing status
- Ask questions about your video
- File Upload: Video saved to
data/uploads/{id}_{name}.mp4 - Database: Metadata stored in
videostable withoriginal_path - Job Queue: Processing job created in
jobstable - Worker: Polls for jobs using
FOR UPDATE SKIP LOCKED
The worker processes videos through 6 stages:
Input: uploads/{id}_{name}.mp4
β
βββΆ [1. NORMALIZE] β processed/{id}/normalized.mp4
β β processed/{id}/audio.wav
β
βββΆ [2. TRANSCRIBE] β transcript_segments table
β β subs/{id}.srt
β
βββΆ [3. SCENES] β scenes table (t_start, t_end)
β
βββΆ [4. FRAMES] β frames/{id}/scene_*.jpg
β β frames table (phash, path)
β
βββΆ [5. VISION] β frame_captions table (caption, entities)
β
βββΆ [6. EMBEDDINGS] β UPDATE embeddings (1536-dim vectors)
Output: video.status = 'ready'
videos: Video metadata (original_path, normalized_path, status, duration)jobs: Processing queue with status trackingscenes: Scene boundaries detected in videosframes: Extracted frames with perceptual hashestranscript_segments: Audio transcription with embeddingsframe_captions: Vision analysis with embeddings
videos (1) βββ (many) jobs
videos (1) βββ (many) scenes
scenes (1) βββ (many) frames
frames (1) βββ (1) frame_captions
videos (1) βββ (many) transcript_segments
- POST
/login- Login with demo credentials - Response: Redirect to upload page
- POST
/api/upload- Upload video file - Response:
{ id: string }
- GET
/api/videos- List all videos - GET
/api/videos/[id]/status- Get processing status - GET
/api/videos/[id]/summary- Get processing results
- POST
/api/ask- Ask questions about videos - POST
/api/ask/upload-image- Upload image for multimodal search - Response: Streaming text response
- GET
/api/frames/[videoId]/[frameNum]- Serve frame images - Response: JPEG image with caching headers
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
β | - | PostgreSQL connection string |
OPENAI_KEY |
β | - | OpenAI API key for AI processing |
NODE_ENV |
β | development | Environment mode |
- Stored in DB: Relative paths like
uploads/{id}_{name}.mp4 - Resolved by Worker:
{DATA_DIR}/{relative_path}β absolute path - Benefits: Portable across environments, easy to move data
data/
βββ uploads/ # Original uploaded videos
βββ processed/ # Normalized videos and audio
β βββ {video_id}/
βββ frames/ # Extracted frame images
β βββ {video_id}/
βββ subs/ # SRT subtitle files
βββ ask-uploads/ # User-uploaded images for chat
βββ worker/ # Worker logs
βββ log.log
video-qa/
βββ src/app/api/ # API routes
β βββ upload/ # Upload endpoint
β βββ ask/ # Chat interface
β βββ videos/ # Video management
β βββ frames/ # Frame image serving
βββ src/app/(app)/ # Protected pages
β βββ upload/ # Upload UI
β βββ ask/ # Chat interface
βββ src/app/(auth)/ # Authentication
β βββ login/ # Login page
βββ src/components/ # React components
β βββ DashboardLayout # Main layout
β βββ ChatMessage # Message rendering
β βββ ThemeProvider # MUI theming
βββ lib/ # Shared utilities
β βββ db.ts # Database functions
β βββ rag.ts # RAG system
β βββ vision.ts # Vision analysis
β βββ file.ts # File operations
βββ postgres/ # Database schema
βββ initdb/
- Authentication: Demo login system with cookie-based sessions
- Multimodal Search: RAG system with vector embeddings and image analysis
- Real-time Chat: Streaming AI responses with frame and timestamp references
- Material-UI: Modern, responsive interface with custom theming
- Idempotent Operations: Safe to re-run processing
- Error Handling: Comprehensive error logging and user feedback
-
Worker can't find video files
- Check
DATA_DIRis correctly mounted in docker-compose - Verify file exists at resolved path
- Check
-
Database connection errors
- Ensure PostgreSQL is running:
docker-compose ps - Check connection string in
.env.local
- Ensure PostgreSQL is running:
-
OpenAI API errors
- Verify
OPENAI_API_KEYis set correctly - Check API key has sufficient credits
- Verify
-
Path resolution issues
- Ensure uploads use relative paths (
uploads/...) - Check
DATA_DIRenvironment variable
- Ensure uploads use relative paths (
- Worker logs:
data/worker/log.log - Database logs:
docker-compose logs postgres - Next.js logs: Terminal output
docker-compose down
rm -rf data/postgres
docker-compose up -d- ARCHITECTURE.md - Detailed system design
- QUICKSTART.md - 5-minute setup guide
- ../video-qa-worker/README.md - Worker documentation