A multimodal RAG API for document Q&A: upload docs, ask questions, get grounded answers with citations.
Features • Quick Start • API • Eval • Tech Stack
API Docs: mugnihidayah-synapse-rag-api.hf.space/docs
Frontend Repo: mugnihidayah/synapse-frontend
- Multimodal ingestion:
PDF,DOCX,TXT, image files (PNG/JPG/JPEG/WEBP) with OCR. - Async ingestion pipeline: upload queue with status (
queued,processing,ready,ready_with_warnings,failed). - Retrieval upgrades: hybrid search (vector + keyword), reranking, dynamic
top_k, MMR diversification. - Query quality: contextualization, query rewrite, strict grounding guardrail, richer citations.
- Agentic RAG: ReAct-style agent with multi-step reasoning and dynamic tool use (retrieve, compare, summarize, refine).
- Metadata filters at query time: by source, page range, chunk type, content origin.
- Session tools: session status, paginated chunk listing, session export (
markdown/json). - Product signals: feedback endpoint and usage analytics with daily query quota.
- Secure API key auth, rate limiting, structured JSON logging.
git clone https://github.com/mugnihidayah/synapse-instant-document-insight.git
cd synapse-instant-document-insight
# set required env vars
echo "GROQ_API_KEY=gsk_your_key_here" > .env
docker compose up -d
# open http://localhost:8000/docsWe use uv for lightning-fast Python dependency management.
git clone https://github.com/mugnihidayah/synapse-instant-document-insight.git
cd synapse-instant-document-insight
# Install dependencies and create venv automatically
uv sync --all-extras
# Linux/macOS:
source .venv/bin/activate
# Windows PowerShell:
# .venv\Scripts\Activate.ps1
docker compose up db -d
# initialize/update schema (run this on your DB, including Neon)
psql "$DATABASE_URL" -f scripts/init.sql
uvicorn src.api.main:app --reloadBase URL (local): http://localhost:8000/api/v1
POST /keys/does not require auth.- All other endpoints require header:
X-API-Key: sk-...
Create API key:
curl -X POST localhost:8000/api/v1/keys/ \
-H "Content-Type: application/json" \
-d '{"name":"my-app"}'| Method | Endpoint | Description | Auth |
|---|---|---|---|
POST |
/keys/ |
Create API key | No |
GET |
/keys/ |
Get current key metadata | Yes |
DELETE |
/keys/{key_id} |
Revoke current key | Yes |
POST |
/documents/sessions |
Create session | Yes |
GET |
/documents/sessions/{session_id} |
Get session info + ingestion status | Yes |
GET |
/documents/sessions/{session_id}/documents |
Paginated chunk list | Yes |
DELETE |
/documents/sessions/{session_id} |
Delete session | Yes |
POST |
/documents/upload/{session_id} |
Upload documents (async by default) | Yes |
GET |
/documents/{document_id}/file |
Stream original uploaded file (inline) |
Yes |
GET |
/documents/supported-formats |
Supported upload formats | No |
POST |
/query/{session_id} |
Non-streaming query | Yes |
POST |
/query/stream/{session_id} |
Streaming query (SSE) | Yes |
POST |
/insights/feedback/{session_id} |
Submit answer feedback | Yes |
GET |
/insights/usage |
Usage + quota summary | Yes |
GET |
/insights/export/{session_id} |
Export chat history | Yes |
# 1) Create key
API_KEY=$(curl -s -X POST localhost:8000/api/v1/keys/ \
-H "Content-Type: application/json" \
-d '{"name":"demo"}' | jq -r '.api_key')
# 2) Create session
SESSION=$(curl -s -X POST localhost:8000/api/v1/documents/sessions \
-H "X-API-Key: $API_KEY" | jq -r '.session_id')
# 3) Upload (async_mode=true default)
curl -X POST "localhost:8000/api/v1/documents/upload/$SESSION" \
-H "X-API-Key: $API_KEY" \
-F "files=@report.pdf"
# 4) Poll session status until ingestion_status=ready
curl -H "X-API-Key: $API_KEY" \
"localhost:8000/api/v1/documents/sessions/$SESSION"
# 5) Query
curl -X POST "localhost:8000/api/v1/query/$SESSION" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"question":"What is the revenue growth?","language":"en"}'Example payload with filters/debug:
{
"question": "Summarize key risks and compare with Q1",
"language": "en",
"agent_mode": true,
"max_agent_steps": 5,
"top_k": 8,
"rerank_top_k": 3,
"include_debug": true,
"strict_grounding": true,
"enable_query_rewrite": true,
"filters": {
"sources": ["risk-report.pdf"],
"page_from": 2,
"page_to": 12,
"chunk_types": ["content"],
"content_origin": "text+ocr"
}
}Source item in query response now includes optional document_id (used by frontend to request the original file):
{
"chunk_id": "chunk_abc",
"document_id": "doc456",
"source": "report.pdf",
"page": 3
}async_mode(defaulttrue)enable_ocr(default from config)extract_tables(default from config)
Example:
curl -X POST "localhost:8000/api/v1/documents/upload/$SESSION?async_mode=false&enable_ocr=true&extract_tables=true" \
-H "X-API-Key: $API_KEY" \
-F "files=@scanned.pdf"Ingestion now returns structured, per-file outcomes so clients can handle warnings and errors without parsing raw error strings.
- Upload (
POST /documents/upload/{session_id}) includessummaryandfile_results[]. - Session info (
GET /documents/sessions/{id}) includesingestion_summaryandingestion_warnings[]. - Existing
ingestion_erroris preserved for backward compatibility. - Supported ingestion statuses:
queued,processing,ready,ready_with_warnings,failed. - Full schema details are available in
/docs(OpenAPI).
Per API key:
- Query:
50/minute - Upload:
5/minute - Session operations:
20/minute
Daily soft quota:
- Query quota also tracked per day (
USAGE_DAILY_QUERY_QUOTA, default1000).
Lightweight eval harness:
python scripts/eval/eval_harness.py --input scripts/eval/sample_predictions.jsonlMetrics:
exact_matchtoken_f1grounding_scoresource_recall
- Backend: FastAPI, SQLAlchemy, Pydantic, Uvicorn
- Database: PostgreSQL + pgvector
- AI/ML: LangChain, Groq LLM, HuggingFace embeddings, Cohere/local reranker
- OCR:
rapidocr_onnxruntime(RapidOCR) - Search: Hybrid retrieval + reranking + MMR
- DevOps: Docker, GitHub Actions, Hugging Face Spaces
- Async ingestion uses in-process background tasks (no distributed worker yet).
- Upload still buffers files in memory before processing.
- OCR quality depends on scan quality and OCR runtime availability.
- Sessions expire automatically after 24 hours.
- Free-tier hosting can have cold starts.
pytest tests/ -v --cov=src
ruff check src/
mypy src/# Required
GROQ_API_KEY=gsk_your_key
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/synapse_db
# Optional
COHERE_API_KEY=
HUGGINGFACE_TOKEN=
RERANKER_PROVIDER=cohere
LLM_MODEL=llama-3.3-70b-versatile
EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
RERANKER_MODEL=ms-marco-MiniLM-L-12-v2
LOG_LEVEL=info
PORT=8000
DEBUG=false
CACHE_DIR=./opt
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# Retrieval / quality
RETRIEVAL_TOP_K=10
RETRIEVAL_FETCH_K=20
DYNAMIC_TOP_K_MIN=4
DYNAMIC_TOP_K_MAX=10
RERANK_TOP_K=3
USE_HYBRID_SEARCH=true
HYBRID_VECTOR_WEIGHT=0.5
HYBRID_KEYWORD_WEIGHT=0.5
USE_MMR=true
MMR_LAMBDA=0.7
GROUNDEDNESS_THRESHOLD=0.15
QUERY_REWRITE_ENABLED=true
# Ingestion
INGESTION_ASYNC_DEFAULT=true
UPLOAD_DIR=./uploads
# On Hugging Face Spaces set UPLOAD_DIR=/data/uploads for persistent storage
ENABLE_OCR=true
ENABLE_TABLE_EXTRACTION=true
MAX_UPLOAD_FILE_SIZE_MB=50
# Analytics / export
USAGE_DAILY_QUERY_QUOTA=1000
EXPORT_MAX_MESSAGES=200
# Agent settings
AGENT_ENABLED=true
AGENT_MAX_ITERATIONS=5
AGENT_TEMPERATURE=0.1MIT