A Retrieval-Augmented Generation (RAG) system for document-based question answering.
- Python 3.12+
uvpackage manager- Google API key (set in
.env)
# Set up environment
export GOOGLE_API_KEY=your_api_key_hereuv run main_api.pyThen open your browser and go to:
- Web Frontend: http://localhost:8000
- Swagger API Docs: http://localhost:8000/docs
The system includes a modern, professional web interface for testing:
- Upload Documents - Drag & drop or click to upload PDFs
- View Documents - See all uploaded documents with metadata
- Ask Questions - Query your documents in natural language
- View Results - Get answers with source documents highlighted
- Manage Documents - Delete documents as needed
- System Status - Monitor server health and database status
- 📤 Easy Upload - Drag-and-drop PDF upload with progress tracking
- ❓ Smart Queries - Natural language question answering
- 📚 Source Citations - See which documents your answers came from
- 🎯 Status Monitoring - Real-time system health and document count
- 🎨 Modern UI - Professional, responsive design
- ⚡ Instant Feedback - Real-time status updates
-
Ingestion (
ingestion_pipeline/)- Document loading and chunking
- Vector database management
-
Retrieval (
retriever/)- Multi-query expansion
- Hybrid search (vector + BM25)
- Cross-encoder reranking
-
Generation (
generation/)- LLM answer generation
- RAG pipeline coordination
-
Backend (
backend/)- FastAPI routes and schemas
- Document upload management
-
Frontend (
frontend/)- Single-page HTML5 application
- Real-time WebSocket-like updates
- Responsive design with embedded CSS and JavaScript
Edit config.py to adjust:
- Model settings
- Chunk sizes
- Retrieval parameters
- Storage locations
POST /chat- Query the knowledge baseGET /health- System health check
POST /upload- Upload a PDFGET /documents- List all documentsDELETE /documents/{name}- Delete a documentPOST /rebuild-index- Rebuild the vector index
GET /- Serve the web frontendGET /docs- Interactive API documentation
- Start Server →
uv run main_api.py - Open Frontend → http://localhost:8000
- Upload PDFs → Use the upload area (drag & drop supported)
- Ask Questions → Type your question and get instant answers
- Manage Documents → Delete documents as needed
Code organization:
- Configuration centralized in
config.py - Shared LLM instance via
get_llm() - Type hints throughout
- Modular, testable functions
- Frontend is fully client-side (no Node.js needed)
- Backend: FastAPI
- LLM: Google Generative AI (Gemini)
- Embeddings: Hugging Face (
BAAI/bge-small-en-v1.5) - Reranking: Cross-encoders (
BAAI/bge-reranker-base) - Vector DB: Chroma
- Search: Hybrid (vector + BM25)
- Frontend: HTML5, CSS3, Vanilla JavaScript
- The system starts with an empty knowledge base
- Upload at least one document before asking questions
- The frontend automatically detects when the database is empty
- All timestamps are displayed in local time
- The API automatically handles document chunking and embedding