- Introduction
- Installation
- First-Time Setup
- Basic Usage
- Working with Documents
- Advanced Features
- Troubleshooting
- Command Reference
- Best Practices
Vector Bot is a powerful tool that lets you ask questions about your documents using AI, all while keeping everything on your computer. No internet connection required, no data sent to the cloud - everything runs locally.
- Query your documents: Ask natural language questions about PDFs, text files, markdown files
- Build a knowledge base: Index technical documentation, research papers, notes
- Private AI assistant: Get AI-powered answers without sharing sensitive data
- Offline operation: Works completely offline once set up
If you received vector-bot.exe (Windows) or vector-bot (Mac/Linux):
- Place the executable in a convenient location (e.g.,
C:\Tools\or/usr/local/bin/) - Add to PATH (optional) for easier access
- Test it works:
vector-bot --version
- Install Python 3.10+ from python.org
- Clone or download the project
- Install dependencies:
pip install -e .
- Download Ollama from ollama.ai
- Install and start Ollama:
ollama serve
- Install a chat model:
# Install a model (only needed once) ollama pull llama3.1 # List available models ollama list
# Check if everything is working
vector-bot doctor
# Expected output:
# ✓ Ollama server is running
# ✓ Chat model: llama3.1
# ✓ Embedding model: nomic-embed-textIf the doctor command shows the embedding model is missing:
ollama pull nomic-embed-textCreate a .env file in your working directory:
# Copy the example configuration
cp .env.example .env
# Edit with your preferences
notepad .env # Windows
nano .env # Mac/LinuxCommon settings to adjust:
DOCS_DIR: Where to look for documents (default:./docs)OLLAMA_CHAT_MODEL: Which AI model to use (default: auto-detect)SIMILARITY_TOP_K: How many relevant chunks to retrieve (default: 4)
Create a docs folder and add your files:
mkdir docs
# Copy your PDFs, text files, markdown files into docs/
cp ~/Documents/*.pdf docs/Supported formats:
.txt- Plain text files.md- Markdown files.pdf- PDF documents.json- JSON files.csv- CSV files
Build the searchable index:
vector-bot ingest
# Output:
# Loading 15 documents...
# Building vector index...
# ✓ Index saved to ./index_storage
# ✓ Indexed 127 chunksNote: This step only needs to be done once. Re-run when you add new documents.
Query your documents:
# Simple question
vector-bot query "What are the project requirements?"
# Question with more context chunks
vector-bot query "Explain the authentication flow" --k 6
# Show which documents were used
vector-bot query "What is the deployment process?" --show-sourcesdocs/
├── projects/
│ ├── project-a-spec.pdf
│ └── project-b-notes.md
├── guides/
│ ├── user-manual.pdf
│ └── admin-guide.md
└── references/
├── api-docs.md
└── troubleshooting.txt
- Documents over 20MB are automatically skipped
- Break large documents into smaller parts if needed
- Use markdown format when possible (faster processing)
When you add new documents:
# Add new files to docs/
cp new-document.pdf docs/
# Re-run ingestion (safe - won't duplicate)
vector-bot ingestTo completely rebuild the index:
# Remove old index
rm -rf index_storage/ # Unix
rmdir /s index_storage # Windows
# Rebuild
vector-bot ingestUse different settings for different scenarios:
# Development mode (verbose output)
vector-bot --env development doctor
# Production mode (optimized)
vector-bot --env production ingest
# Check current configuration
vector-bot --config-info --env productionOverride the default model:
# Set via environment variable
export OLLAMA_CHAT_MODEL=llama3.3
vector-bot query "What is the summary?"
# Or in .env file
OLLAMA_CHAT_MODEL=mistralProcess multiple questions from a file:
# Create questions file
echo "What is the main purpose?" > questions.txt
echo "List all requirements" >> questions.txt
echo "Explain the architecture" >> questions.txt
# Process all questions
while IFS= read -r question; do
echo "Q: $question"
vector-bot query "$question"
echo "---"
done < questions.txt# Python script example
import subprocess
import json
def query_rag(question):
result = subprocess.run(
["rag", "query", question],
capture_output=True,
text=True
)
return result.stdout
answer = query_rag("What are the key features?")
print(answer)# Start Ollama
ollama serve
# Check if it's running
curl http://localhost:11434/api/tags# Install a model
ollama pull llama3.1
# List available models
ollama list
# Set in configuration
export OLLAMA_CHAT_MODEL=llama3.1# Check documents directory
ls docs/
# Verify path in config
vector-bot --config-info
# Use absolute path if needed
export DOCS_DIR=/full/path/to/documents# Build the index first
vector-bot ingest
# Check index location
ls index_storage/- Reduce similarity chunks:
vector-bot query "question" --k 2 - Use a faster model:
OLLAMA_CHAT_MODEL=llama3.1 - Check available RAM and close other applications
If Ollama is on a different port:
# Set custom URL
export OLLAMA_BASE_URL=http://localhost:8080
vector-bot doctorvector-bot [--env ENV] [--config-info] COMMAND--env ENV: Use specific environment (development, production, docker)--config-info: Show configuration and exit--version: Show version--help: Show help
vector-bot doctor [--verbose]Checks:
- Ollama server connectivity
- Available models
- Configuration validity
vector-bot ingest [--verbose]Options:
--verbose: Show detailed progress
vector-bot query "your question" [OPTIONS]Options:
--k N: Number of similar chunks to retrieve (default: 4)--show-sources: Display source documents used--verbose: Show detailed processing
| Variable | Description | Default |
|---|---|---|
DOCS_DIR |
Documents directory | ./docs |
INDEX_DIR |
Index storage location | ./index_storage |
OLLAMA_BASE_URL |
Ollama server URL | http://localhost:11434 |
OLLAMA_CHAT_MODEL |
Chat model to use | Auto-detect |
OLLAMA_EMBED_MODEL |
Embedding model | nomic-embed-text |
SIMILARITY_TOP_K |
Retrieval chunks | 4 |
RAG_ENV |
Environment profile | development |
- Use descriptive filenames:
project-spec-v2.pdfinstead ofdoc1.pdf - Break up large documents: Split >20MB files into chapters
- Prefer text formats: Markdown and text process faster than PDFs
- Include context: Add summary sections to technical documents
- Be specific: "What is the login process?" vs "How does it work?"
- Use keywords: Include important terms from your documents
- Iterate: If first answer isn't complete, refine your question
- Adjust retrieval: Use
--kto get more or fewer context chunks
- Index once: Don't rebuild unless adding documents
- Choose appropriate models:
- Fast:
llama3.1(7B parameters) - Balanced:
llama3.3(70B parameters) - Best:
mixtral(8x7B parameters)
- Fast:
- Limit document size: Break PDFs >10MB into sections
- Use SSD storage: Place index on fast storage
- Keep documents local: Never put sensitive docs in cloud folders
- Control access: Restrict
docs/andindex_storage/permissions - No telemetry: This tool never sends data externally
- Audit models: Verify Ollama models are from trusted sources
# Index research papers
cp ~/Research/Papers/*.pdf docs/
vector-bot ingest
# Find relevant studies
vector-bot query "What studies discuss quantum computing applications?"
vector-bot query "Summarize findings about machine learning in healthcare" --k 8# Index API docs and guides
cp -r ~/project/docs/* docs/
vector-bot ingest
# Quick lookups
vector-bot query "How do I authenticate API requests?"
vector-bot query "What are the rate limits?"
vector-bot query "Show example of webhook implementation" --show-sources# Index notes and articles
cp ~/Notes/*.md docs/
vector-bot ingest
# Search your notes
vector-bot query "What did I learn about Docker networking?"
vector-bot query "Find my notes about Python decorators"# Index manuals and guides
cp ~/Manuals/*.pdf docs/
vector-bot ingest
# Troubleshooting
vector-bot query "How to reset the printer?"
vector-bot query "What does error code E45 mean?"- README.md: Technical documentation and setup
- DEPLOYMENT.md: Multi-environment deployment guide
- CLAUDE.md: Development guidelines
# Check system status
vector-bot doctor --verbose
# Show configuration
vector-bot --config-info
# Command help
vector-bot --help
vector-bot query --help
# Version information
vector-bot --version| Error | Meaning | Solution |
|---|---|---|
| "Ollama server not running" | Can't connect to Ollama | Start with ollama serve |
| "No models installed" | No AI models available | Run ollama pull llama3.1 |
| "No documents found" | Empty docs directory | Add files to docs/ folder |
| "Index not found" | Haven't built index | Run vector-bot ingest first |
If you're contributing to the project or want to run tests:
# Install development dependencies
pip install -e ".[dev]"
# Run all tests
pytest tests/ -v
# Run unit tests only
pytest tests/unit/ -v
# Run with coverage report
pytest tests/ --cov=vector-bot --cov-report=html
# Use the test runner script
python run_tests.py# Type checking
mypy src/
# Linting
ruff check src/
# Security scanning
safety check
bandit -r src/See CONTRIBUTING.md for detailed contribution guidelines.
# Initial setup
ollama pull llama3.1 # Install AI model
ollama pull nomic-embed-text # Install embedding model
vector-bot doctor # Verify setup
# Daily workflow
cp document.pdf docs/ # Add document
vector-bot ingest # Build/update index
vector-bot query "question?" # Ask question
# Useful options
vector-bot query "question?" --k 6 --show-sources
vector-bot --env production ingest
vector-bot --config-info
# Troubleshooting
ollama serve # Start Ollama
ollama list # List models
ls docs/ # Check documentsThis guide covers Vector Bot version 1.0.0