A local, offline chatbot application for querying PDF content using Retrieval-Augmented Generation (RAG) with Ollama.
- Local & Offline: Complete privacy - no data leaves your machine
- PDF Processing: Upload and process PDF documents
- RAG Pipeline: Advanced retrieval-augmented generation
- Vector Search: Semantic search using ChromaDB
- Modern UI: Clean React + Tailwind interface
- Flexible Models: Support for various Ollama models
chatbot-app/
βββ backend/ # FastAPI Python backend
β βββ main.py # API endpoints
β βββ rag.py # RAG pipeline
β βββ embed.py # Embedding service
β βββ vector_store.py # Vector database
βββ frontend/ # React + Tailwind UI
β βββ src/components/ # Reusable components
β βββ src/pages/ # Application pages
βββ docker-compose.yml # Development setup
- Python 3.8+
- Node.js 16+
- Ollama installed and running
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull required models
ollama pull mistral
ollama pull nomic-embed-text# Navigate to backend
cd chatbot-app/backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Navigate to frontend
cd ../frontend
# Install dependencies
npm install# Start Ollama service
ollama serve
# In another terminal, ensure models are available
ollama listcd backend
source venv/bin/activate # On Windows: venv\Scripts\activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm start- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Navigate to the "Upload" tab
- Drag & drop PDF files or click to select
- Click "Process Documents" to ingest into vector database
- Navigate to the "Chat" tab
- Ask questions about your uploaded documents
- The system will provide answers with source references
Edit .env file to customize:
# Ollama Configuration
OLLAMA_URL=http://localhost:11434
LLM_MODEL=mistral
EMBEDDING_MODEL=nomic-embed-text
# RAG Settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# API Configuration
REACT_APP_API_URL=http://localhost:8000You can use any Ollama model:
mistral(recommended)llama3codellamaneural-chat
# Start everything with Docker
docker-compose up --build
# Access:
# Frontend: http://localhost:3000
# Backend: http://localhost:8000| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Send chat query |
/upload |
POST | Upload PDF file |
/ingest |
POST | Process uploaded files |
/health |
GET | Health check |
- Document Upload: PDFs are uploaded and stored locally
- Text Extraction: PyPDF2 extracts text from documents
- Chunking: Text is split into overlapping chunks
- Embedding: Chunks are embedded using Ollama/SentenceTransformers
- Vector Storage: Embeddings stored in ChromaDB
- Query Processing: User queries are embedded and searched
- Response Generation: Relevant chunks sent to LLM for response
Ollama Connection Error
# Check if Ollama is running
curl http://localhost:11434/api/version
# Start Ollama if not running
ollama serveModel Not Found
# Pull required models
ollama pull mistral
ollama pull nomic-embed-textPort Already in Use
# Change ports in .env file
REACT_APP_API_URL=http://localhost:8001
# Start backend on different port
uvicorn main:app --port 8001- RAM: Ensure sufficient RAM for embeddings (4GB+ recommended)
- Storage: Vector database grows with document count
- Models: Smaller models run faster but may be less accurate
- Fork the repository
- Create feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama for local LLM inference
- ChromaDB for vector storage
- FastAPI for the backend framework
- React + Tailwind for the frontend