A self-hosted web service for ingesting thousands of technical documents and interacting with them through natural language chat powered by RAG (Retrieval-Augmented Generation).
KnowledgeVault enables you to:
- Upload and index documents in formats: docx, xlsx, pptx, pdf, csv, sql, txt, and code files
- Chat with your documents using AI-powered RAG responses
- Store and retrieve memories for persistent knowledge across sessions
- Search your knowledge base with semantic similarity
- Self-host everything on your own infrastructure with local LLMs
| Feature | Description |
|---|---|
| Multi-Format Support | Process Word, Excel, PowerPoint, PDF, CSV, SQL, and text documents |
| Semantic Chunking | Structure-aware document processing preserves tables and code blocks |
| Vector Search | LanceDB-powered semantic search with relevance scoring |
| Memory System | SQLite FTS5-backed memory storage with natural language retrieval |
| Streaming Chat | Real-time AI responses with source citations |
| File Watcher | Automatic detection and processing of new documents |
| Email Ingestion | Ingest documents via email with IMAP polling and vault routing |
| Web UI | Modern React interface with Material 3 design |
| API Access | Full REST API with OpenAPI documentation |
+------------------+ +------------------+ +------------------+
| React Frontend |---->| FastAPI Backend |---->| LanceDB Vector |
| (Port 5173*) | | (Port 8080) | | Store |
*Port 5173 is for development only. Production access is via port 8080.
+------------------+ +------------------+ +------------------+
| |
| +------v------+
| | SQLite |
| | Memories |
| +-------------+
|
+------v---------------------------+
| Ollama (External) |
| - Embeddings (bge-m3) |
| - Chat (your choice of model) |
+----------------------------------+
In production, the frontend is built as static files and served directly by the backend container on port 8080. Both the React frontend and FastAPI backend are accessible through a single port (8080).
| Component | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, shadcn/ui, Tailwind CSS, assistant-ui |
| Backend | Python 3.11, FastAPI, Pydantic |
| Vector DB | LanceDB (embedded) |
| Memory DB | SQLite with FTS5 |
| Document Processing | Unstructured.io |
| LLM Integration | Ollama API (OpenAI-compatible) |
| Deployment | Docker Compose |
- Docker and Docker Compose installed
- Ollama installed and running (see Ollama Setup below)
- At least 8GB RAM (16GB+ recommended)
git clone <repository-url>
cd RAGAPPv2
cp .env.example .envEdit .env to match your setup:
# Required: Set your data directory
HOST_DATA_DIR=/path/to/your/data
# Optional: Change default models
CHAT_MODEL=llama3.2:latestEnsure Ollama is running on your host machine:
# macOS/Linux
ollama serve
# Windows (Ollama runs as a service by default)
# Verify with:
ollama list# Required: Embedding model
ollama pull bge-m3
# Required: Chat model (choose one)
ollama pull qwen2.5:32b # Recommended for technical content
ollama pull llama3.2:latest # Lighter alternativedocker compose up -dOpen your browser to: http://localhost:8080
| Variable | Default | Description |
|---|---|---|
PORT |
8080 | Web server port |
HOST_DATA_DIR |
./data | Host path for data persistence |
DATA_DIR |
/app/data | Container data path |
OLLAMA_EMBEDDING_URL |
http://host.docker.internal:11434 | Ollama embedding endpoint |
OLLAMA_CHAT_URL |
http://host.docker.internal:11434 | Ollama chat endpoint |
EMBEDDING_MODEL |
bge-m3 | Embedding model name |
CHAT_MODEL |
qwen2.5:32b | Chat model name |
CHUNK_SIZE |
512 | Document chunk size (tokens) |
CHUNK_OVERLAP |
50 | Chunk overlap (tokens) |
MAX_CONTEXT_CHUNKS |
10 | Max chunks in RAG context |
RAG_RELEVANCE_THRESHOLD |
0.1 | Minimum relevance score (0.0-1.0) |
LOG_LEVEL |
INFO | Logging level |
AUTO_SCAN_ENABLED |
true | Enable auto-scanning |
AUTO_SCAN_INTERVAL_MINUTES |
60 | Scan interval |
IMAP_ENABLED |
false | Enable email ingestion |
IMAP_HOST |
- | IMAP server hostname |
IMAP_PORT |
993 | IMAP server port (993 for SSL, 143 for non-SSL) |
IMAP_USE_SSL |
true | Use SSL/TLS for IMAP connection |
IMAP_USERNAME |
- | IMAP account username |
IMAP_PASSWORD |
- | IMAP account password |
IMAP_POLL_INTERVAL |
60 | Email poll interval (seconds) |
data/
├── knowledgevault/ # Root data directory
│ ├── uploads/ # [LEGACY] Legacy flat uploads directory (deprecated)
│ ├── vaults/ # Vault-specific data directories
│ │ ├── 1/ # Vault 1 (default/orphan vault)
│ │ │ └── uploads/ # Uploads for vault 1
│ │ ├── 2/ # Vault 2
│ │ │ └── uploads/ # Uploads for vault 2
│ │ └── ... # Additional vaults
│ ├── documents/ # Documents (legacy, kept for compatibility)
│ ├── library/ # Library files
│ ├── lancedb/ # Vector database
│ │ └── chunks.lance/
│ ├── app.db # SQLite database
│ └── logs/
│ └── app.log
Note: The system now stores uploads in vault-specific directories (/data/knowledgevault/vaults/{vault_id}/uploads/). On first startup, the system automatically migrates files from the legacy flat uploads/ directory to the appropriate vault-specific directories. Files are renamed with .migrated suffix to create a safe backup. If a file cannot be associated with a specific vault, it defaults to the orphan vault (vault 1).
bge-m3 (Required)
- 768 dimensions
- 8192 token context
- ~0.5GB VRAM
- Excellent for technical content
ollama pull bge-m3| Model | Size | RAM | Speed | Best For |
|---|---|---|---|---|
| qwen2.5:32b | 32B | ~22GB | ~15 tok/s | Technical reasoning |
| qwen2.5:72b | 72B | ~45GB | ~10 tok/s | Complex analysis |
| llama3.2:latest | 3B | ~4GB | ~30 tok/s | General use, fast |
| mistral:latest | 7B | ~8GB | ~25 tok/s | Balanced performance |
# Pull your preferred chat model
ollama pull qwen2.5:32b# Test Ollama is running
curl http://localhost:11434/api/tags
# Test embedding model
curl http://localhost:11434/api/embeddings -d '{
"model": "bge-m3",
"prompt": "test"
}'Problem: docker compose up fails
Solutions:
# Check Docker is running
docker info
# Check port availability
lsof -i :8080 # macOS/Linux
netstat -ano | findstr :8080 # Windows
# View logs
docker compose logs knowledgevaultProblem: Health check shows "LLM unavailable"
Solutions:
- Verify Ollama is running:
ollama list - Check Ollama URL in
.envmatches your setup - For Linux, use host IP instead of
host.docker.internal:OLLAMA_CHAT_URL=http://192.168.1.100:11434
Problem: Uploaded files stay in "pending" status
Solutions:
- Check logs:
docker compose logs -f knowledgevault - Verify file format is supported
- Check disk space in data directory
- Restart container:
docker compose restart
Problem: Container crashes during document processing
Solutions:
- Reduce
CHUNK_SIZEin.env(e.g., 256) - Process fewer files at once
- Increase Docker memory limit
- Use smaller chat model
Problem: Chat responses are very slow
Solutions:
- Use a smaller/faster chat model
- Reduce
MAX_CONTEXT_CHUNKSin.env - Increase
RAG_RELEVANCE_THRESHOLDto filter more chunks - Ensure Ollama has GPU access if available
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Service health status |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/auth/register |
Register new user (first = superadmin) |
| POST | /api/v1/auth/login |
Login with username/password |
| POST | /api/v1/auth/refresh |
Refresh access token |
| POST | /api/v1/auth/logout |
Logout and revoke refresh token |
| GET | /api/v1/auth/me |
Get current user profile (includes must_change_password flag) |
| PATCH | /api/v1/auth/me |
Update current user profile |
| POST | /api/v1/auth/change-password |
Change user password (validates strength policy) |
| GET | /api/v1/auth/sessions |
List active sessions for current user |
| DELETE | /api/v1/auth/sessions/{id} |
Revoke a specific session |
| DELETE | /api/v1/auth/sessions |
Revoke all other sessions |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/vaults |
List all vaults with counts |
| GET | /api/v1/vaults/accessible |
List vault IDs user has access to |
| GET | /api/v1/vaults/{id} |
Get vault details |
| POST | /api/v1/vaults |
Create new vault |
| PUT | /api/v1/vaults/{id} |
Update vault |
| DELETE | /api/v1/vaults/{id} |
Delete vault |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/chat |
Non-streaming chat (requires vault_id with read access) |
| POST | /api/v1/chat/stream |
Streaming chat (SSE, requires vault_id with read access) |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/documents |
List all documents |
| GET | /api/v1/documents/stats |
Document statistics |
| POST | /api/v1/documents/upload |
Upload file(s) |
| POST | /api/v1/documents/scan |
Trigger directory scan |
| DELETE | /api/v1/documents/{id} |
Delete document |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/search |
Semantic search |
| POST | /api/v1/search/chunks |
Search document chunks |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/memories |
List all memories |
| GET | /api/v1/memories/search |
Search memories |
| POST | /api/v1/memories |
Create memory |
| PUT | /api/v1/memories/{id} |
Update memory |
| DELETE | /api/v1/memories/{id} |
Delete memory |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/settings |
Get settings |
| PUT | /api/v1/settings |
Update settings |
Interactive API docs available at: http://localhost:8080/docs
OpenAPI schema: http://localhost:8080/openapi.json
The web interface uses a navigation rail with five sections:
- Chat - Ask questions about your documents using the streaming AI interface
- Search - Find specific content in your knowledge base
- Documents - Upload and manage documents
- Memory - View and manage stored memories
- Settings - Configure application settings (includes password change for forced updates)
The frontend implements secure route protection with the following features:
- JWT Authentication: Access tokens with automatic refresh on expiration
- Password Policy Enforcement: Users with
must_change_password=trueare automatically redirected to change their password - Role-Based Access: Admin-only routes are protected via
AdminRoutecomponent - Session Management: Users can view and revoke active sessions from Settings
- Type your question in the input field
- Press Enter or click Send
- Watch the AI response stream in real-time (powered by SSE)
- Click "Sources" to see which documents were referenced
- Say "Remember that..." to save information to memory
Streaming: The chat interface uses Server-Sent Events (SSE) for real-time response streaming with automatic token refresh on 401 errors.
Method 1: Web Upload
- Go to Documents page
- Click "Upload" or drag files onto the drop zone
- Files are automatically processed and indexed
Method 2: Direct File Placement
- Place files in
data/knowledgevault/vaults/{vault_id}/uploads/(e.g.,data/knowledgevault/vaults/1/uploads/) - Click "Scan Directory" on Documents page
- Or wait for auto-scan (if enabled)
- Go to Search page
- Enter search query
- Use filters to narrow results:
- File type
- Date range
- Relevance threshold
- Click results to view source context
- Go to Memory page to view all memories
- Use search to find specific memories
- Click edit icon to modify
- Click delete icon to remove
- Memories are automatically used in chat context
# Run with hot-reload (includes frontend dev service)
docker compose -f docker-compose.yml -f docker-compose.override.yml up -d
# View logs
docker compose logs -f backend
# Run tests
docker compose exec backend pytest tests/cd frontend
npm install
npm run devThe frontend includes a streaming-capable API client (frontend/src/lib/api.ts):
apiRequest<T>(method, path, body?)- Standard REST requests with JWT authapiStream(path, body, callbacks)- SSE streaming with automatic token refresh
The auth store (frontend/src/stores/authStore.ts) tracks:
- User session state
must_change_passwordflag from/api/v1/auth/meendpoint- Password change enforcement via
useRequirePasswordChangehook
docker compose -f docker-compose.yml build
docker compose -f docker-compose.yml up -d- Email Ingestion - Ingest documents via email with IMAP polling and automatic vault routing
- Admin Guide - Administrative tasks and configuration
- Release Process - Deployment and release procedures
- Non-Technical Setup - Setup guide for non-technical users
No license file present. Add LICENSE file or update this section as needed.
- Documentation: See
docs/directory - Issues: Create an issue in the repository
- Admin Guide: See
docs/admin-guide.md - Non-Technical Setup: See
docs/non-technical-setup.md