Skip to content

markusvankempen/SQL-Query-Generator-with-Self-Improving-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Query Generator with Self-Improving AI

Author: Markus van Kempen (markus.van.kempen@gmail.com) Last Updated: January 24, 2026 Version: 2.0

BeeAI Agent Demo

A Streamlit application that converts natural language queries into SQL using IBM WatsonX LLMs with self-improving capabilities powered by embeddings.

Features

🧠 Self-Improving System

  • Embedding-based learning: Uses IBM Slate embeddings to find semantically similar past queries
  • Few-shot prompting: Automatically includes successful examples in prompts
  • Error avoidance: Learns from common mistakes to improve future queries
  • User feedback: 👍/👎 buttons help the system learn what works

🤖 Four Query Generation Modes

Mode Description Best For
Direct SQL Prompt engineering + SQLGlot validation Speed-critical queries
LangChain Agent LangChain SQL chain with schema injection Complex queries
BeeAI Agent IBM BeeAI Framework with native WatsonX Highest reliability
Self-Improving Full learning system with pattern storage Continuous improvement
SQL Validation SQLGlot (AST Parsing) Robustness & Safety
Agent Frameworks LangChain, BeeAI Framework Flexibility & Extensibility

💬 Hybrid Query Classification

Not all questions need SQL! The system uses a 4-layer hybrid approach:

Layer Status Description
1. Semantic IBM Slate embeddings (79 examples) - understands "per" = "by"
2. Keywords Database/general/help keyword matching
3. Schema-Aware Validates against actual DB content
4. Empty Analysis Explains why queries return no results

Query Types Handled:

  • Database queries → Generate SQL (e.g., "show all products")
  • General questions → Direct response (e.g., "what time is it")
  • Help questions → System guidance (e.g., "what can you do")
  • Unknown entities → Helpful feedback (e.g., "show BMW sales" → "BMW not in database")

The semantic classifier understands that "show total revenue per country" = "revenue by country" = "sales grouped by country" without keyword rules!

See COMPARISON_RESULTS.md for full implementation details and what's missing.

🔍 Product Disambiguation

When a search matches multiple products (e.g., "Wireless Mouse" matches both "Wireless Mouse" and "Wireless Mouse Pro"), the system shows a helpful disambiguation message.

⚡ Semantic Caching (NEW)

  • Memory + disk caching of embedding vectors
  • First query: API call (~450ms) → Cached for reuse (<1ms)
  • Reduces WatsonX API calls by 80%+ for repeated patterns

💬 Context Awareness (NEW)

Understands follow-up queries in conversation:

  • "show more" → expands previous results
  • "same for USA" → applies new filter to previous query
  • "top 5" → limits previous results
  • "those products" → resolves references to previous data

🛡️ Robust Validation (NEW)

  • SQLGlot-powered: AST-based parsing ensures only valid SQL is executed
  • Safety checks: Prevents SELECT * without tables, implicit cross-joins (ON TRUE), and missing join conditions
  • Fallback mechanism: Gracefully degrades if strict validation fails

📊 Schema Enrichment (NEW)

  • Semantic Context: Columns have descriptions (e.g., "status: 1=New")
  • Dynamic Sampling: Injects real valid values into prompt (e.g., country: 'USA', 'Germany')
  • Reduced Hallucinations: LLM knows exactly which values exist in the DB

Quick Start

1. Setup Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure Credentials

Create a .env file:

WATSONX_API_KEY=your_api_key
WATSONX_URL=https://us-south.ml.cloud.ibm.com
WATSONX_PROJECT_ID=your_project_id
DATABASE_PATH=./data/database.db

3. Initialize Database

python scripts/init_database.py

4. Run the App

streamlit run app.py

Project Structure

├── app.py                    # Main Streamlit UI (entry point)
├── watsonx_client.py         # Direct SQL mode with learning
├── langchain_agent.py        # LangChain mode with learning
├── beeai_agent.py            # BeeAI mode with learning
├── self_improving_agent.py   # Full self-improving agent
├── learning_store.py         # Embedding-based pattern storage
├── query_classifier.py       # Hybrid query classification
├── semantic_classifier.py    # Embedding-based intent detection + caching
├── context_manager.py        # Conversation context awareness
├── product_mapper.py         # Product disambiguation
├── schema_loader.py          # Database schema utilities
├── logging_config.py         # Centralized logging
│
├── scripts/                  # Utility scripts
│   └── init_database.py      # Database initialization
│
├── tests/                    # Test suite
│   ├── test_comparison.py    # SQLGlot comparison tests
│   ├── test_self_improving.py    # 100-query test suite
│   ├── test_model_comparison.py  # LLM model benchmark
│   ├── test_mode_comparison.py   # Mode comparison tests
│   └── test_model_mode_comparison.py  # Combined comparison
│
├── docs/                     # Documentation
│   ├── COMPARISON_RESULTS.md # Detailed mode & model comparison
│   └── test_results_summary.md   # Test findings
│
├── data/                     # Data storage
│   ├── database.db           # SQLite database
│   ├── learning.db           # Learning store (patterns, errors)
│   ├── cache/                # Embedding cache
│   └── results/              # Test comparison results (JSON)
│
└── .cursor/rules/            # Cursor AI rules

Database Schema

Tables

  • customers: customer_id, customer_name, email, city, country, region
  • products: product_id, product_code, product_name, category, price
  • orders: order_id, customer_id, product_id, order_date, quantity, total_amount

Views

  • sales: Pre-joined view combining all tables for simplified queries

Self-Improving Architecture

User Query
    ↓
🔍 Query Classifier ──────────────────┐
    │                                  │ (general/help)
    │ (database)                       ↓
    ↓                              📝 Direct Response
🧠 Embedding Search
    │
    ↓
📚 Find Similar Patterns (top 3)
    │
    ↓
⚠️ Find Common Errors to Avoid
    │
    ↓
📝 Build Enhanced Prompt
    │ (includes few-shot examples + error avoidance)
    ↓
🤖 Generate SQL → Execute → Display Results
    │                           │
    ↓                           ↓
💾 Store Success Pattern    ⚠️ Store Error Pattern
    │
    ↓
👍/👎 User Feedback → Updates Pattern Rating

Test Results (100 Queries)

Results using Direct SQL Mode with Self-Improving Learning enabled.

Metric Value
Success Rate 81%
Patterns Learned 81
Errors Logged 20
Embedding Utilization 99%

By Category

Category Success Rate
Product Filtering 100%
Basic Sales 100%
Basic Products 95%
Aggregation 90%
Complex Queries 80%

Model + Mode Comparison

Best Combinations (After Prompt Optimization)

Rank Combination Accuracy Speed Notes
🥇 Granite 4 Small + LangChain 100% 604ms Best overall
🥇 Granite 4 Small + BeeAI 100% ~650ms After prompt fix
🥇 Mistral Small + BeeAI 100% 512ms Fastest 100% combo
🥈 Llama 70B + Any Mode 100% 1200ms+ Most accurate model

Accuracy by Mode (7 test queries)

Mode Accuracy Notes
LangChain 100% Retry logic compensates for errors
BeeAI 100% After prompt optimization
Direct SQL 86% Fast but struggles with complex

LLM Model Benchmark (6 queries across modes)

Model Direct SQL LangChain BeeAI
Llama 3.3 70B 100% 100% 100%
Mistral Small 24B 67% 100% 100%
Granite 4 Small 83% 100% 100%
Granite 3.3 8B 50% 100% 83%

Key Finding

LangChain achieves 100% accuracy with ALL models because it handles retries and schema injection automatically.

Success by Query Type

Model Simple Filter Aggregation Complex
Granite 4 Small 100% 100% 100% 75%
Mistral Small 24B 100% 100% 75% 75%
Llama 3.3 70B 100% 100% 75% 50%

All Supported Models

  • ibm/granite-4-h-small - Recommended - Best accuracy
  • ibm/granite-3-3-8b-instruct - Fast but struggles with complex
  • meta-llama/llama-3-3-70b-instruct - Powerful but slower
  • meta-llama/llama-3-405b-instruct - Largest available
  • mistralai/mistral-small-3-1-24b-instruct-2503 - Fastest
  • mistralai/mistral-medium-2505 - Balanced

API Keys & Services

  • IBM WatsonX: LLM inference (Granite, Llama, Mistral models)
  • IBM Slate Embeddings: ibm/slate-125m-english-rtrvr-v2 for semantic search

Example Queries

Product Queries:

  • "show all laptop products"
  • "list products under $100"
  • "what wireless products do we have"

Sales Queries:

  • "show sales by country"
  • "total revenue from laptops in USA"
  • "top 5 products by sales"

Customer Queries:

  • "how many customers are in Europe"
  • "list customers who ordered smartphones"

General Questions (no SQL):

  • "what time is it"
  • "hello"
  • "what can you do"

Running Tests

# Run all tests with pytest
pytest

# Run specific test file
pytest tests/test_model_comparison.py -v

LLM Model Comparison

# Quick test (5 queries, 3 models)
python tests/test_model_comparison.py --quick

# Full benchmark (12 queries, 4 models)
python tests/test_model_comparison.py

# Specific models
python tests/test_model_comparison.py --models ibm/granite-4-h-small meta-llama/llama-3-3-70b-instruct

Self-Improving Test

# Run 100-query test
python tests/test_self_improving.py --mode direct_sql --queries 100

# Quick test (10 queries)
python tests/test_self_improving.py --quick

Documentation

Document Description
ARCHITECTURE.md System architecture, data flow, components
API.md API reference for all modules
USAGE.md Usage guide with examples
COMPARISON_RESULTS.md Detailed mode & model comparison
test_results_summary.md Test findings
CONTRIBUTING.md Contribution guidelines
SHOWCASE.md Self-Improving Agent Showcase

License

MIT License

About

SQL-Query-Generator-with-Self-Improving-AI

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages