ProxySQL · renecannao · Jan 16, 2026 · Jan 13, 2026 · Jan 14, 2026 · Jan 14, 2026
diff --git a/doc/multi_agent_database_discovery.md b/doc/multi_agent_database_discovery.md
@@ -0,0 +1,246 @@
+# Multi-Agent Database Discovery System
+
+## Overview
+
+This document describes a multi-agent database discovery system implemented using Claude Code's autonomous agent capabilities. The system uses 4 specialized subagents that collaborate via the MCP (Model Context Protocol) catalog to perform comprehensive database analysis.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                     Main Agent (Orchestrator)                       │
+│  - Launches 4 specialized subagents in parallel                     │
+│  - Coordinates via MCP catalog                                      │
+│  - Synthesizes final report                                        │
+└────────────────┬────────────────────────────────────────────────────┘
+                 │
+    ┌────────────┼────────────┬────────────┬────────────┐
+    │            │            │            │            │
+    ▼            ▼            ▼            ▼            ▼
+┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
+│Struct. │  │Statist.│  │Semantic│  │Query   │  │  MCP   │
+│ Agent  │  │ Agent  │  │ Agent  │  │ Agent  │  │Catalog │
+└────────┘  └────────┘  └────────┘  └────────┘  └────────┘
+     │            │            │            │            │
+     └────────────┴────────────┴────────────┴────────────┘
+                          │
+                   ▼              ▼
+              ┌─────────┐  ┌─────────────┐
+              │ Database│  │   Catalog   │
+              │ (testdb)│  │ (Shared Mem)│
+              └─────────┘  └─────────────┘
+```
+
+## The Four Discovery Agents
+
+### 1. Structural Agent
+**Mission**: Map tables, relationships, indexes, and constraints
+
+**Responsibilities**:
+- Complete ERD documentation
+- Table schema analysis (columns, types, constraints)
+- Foreign key relationship mapping
+- Index inventory and assessment
+- Architectural pattern identification
+
+**Catalog Entries**: `structural_discovery`
+
+**Key Deliverables**:
+- Entity Relationship Diagram
+- Complete table definitions
+- Index inventory with recommendations
+- Relationship cardinality mapping
+
+### 2. Statistical Agent
+**Mission**: Profile data distributions, patterns, and anomalies
+
+**Responsibilities**:
+- Table row counts and cardinality analysis
+- Data distribution profiling
+- Anomaly detection (duplicates, outliers)
+- Statistical summaries (min/max/avg/stddev)
+- Business metrics calculation
+
+**Catalog Entries**: `statistical_discovery`
+
+**Key Deliverables**:
+- Data quality score
+- Duplicate detection reports
+- Statistical distributions
+- True vs inflated metrics
+
+### 3. Semantic Agent
+**Mission**: Infer business domain and entity types
+
+**Responsibilities**:
+- Business domain identification
+- Entity type classification (master vs transactional)
+- Business rule discovery
+- Entity lifecycle analysis
+- State machine identification
+
+**Catalog Entries**: `semantic_discovery`
+
+**Key Deliverables**:
+- Complete domain model
+- Business rules documentation
+- Entity lifecycle definitions
+- Missing capabilities identification
+
+### 4. Query Agent
+**Mission**: Analyze access patterns and optimization opportunities
+
+**Responsibilities**:
+- Query pattern identification
+- Index usage analysis
+- Performance bottleneck detection
+- N+1 query risk assessment
+- Optimization recommendations
+
+**Catalog Entries**: `query_discovery`
+
+**Key Deliverables**:
+- Access pattern analysis
+- Index recommendations (prioritized)
+- Query optimization strategies
+- EXPLAIN analysis results
+
+## Discovery Process
+
+### Round Structure
+
+Each agent runs 4 rounds of analysis:
+
+#### Round 1: Blind Exploration
+- Initial schema/data analysis
+- First observations cataloged
+- Initial hypotheses formed
+
+#### Round 2: Pattern Recognition
+- Read other agents' findings from catalog
+- Identify patterns and anomalies
+- Form and test hypotheses
+
+#### Round 3: Hypothesis Testing
+- Validate business rules against actual data
+- Cross-reference findings with other agents
+- Confirm or reject hypotheses
+
+#### Round 4: Final Synthesis
+- Compile comprehensive findings
+- Generate actionable recommendations
+- Create final mission summary
+
+### Catalog-Based Collaboration
+
+```python
+# Agent writes findings
+catalog_upsert(
+    kind="structural_discovery",
+    key="table_customers",
+    document="...",
+    tags="structural,table,schema"
+)
+
+# Agent reads other agents' findings
+findings = catalog_list(kind="statistical_discovery")
+```
+
+## Example Discovery Output
+
+### Database: testdb (E-commerce Order Management)
+
+#### True Statistics (After Deduplication)
+| Metric | Current | Actual |
+|--------|---------|--------|
+| Customers | 15 | 5 |
+| Products | 15 | 5 |
+| Orders | 15 | 5 |
+| Order Items | 27 | 9 |
+| Revenue | $10,886.67 | $3,628.85 |
+
+#### Critical Findings
+1. **Data Quality**: 5/100 (Catastrophic) - 67% data triplication
+2. **Missing Index**: orders.order_date (P0 critical)
+3. **Missing Constraints**: No UNIQUE or FK constraints
+4. **Business Domain**: E-commerce order management system
+
+## Launching the Discovery System
+
+```python
+# In Claude Code, launch 4 agents in parallel:
+Task(
+    description="Structural Discovery",
+    prompt=STRUCTURAL_AGENT_PROMPT,
+    subagent_type="general-purpose"
+)
+
+Task(
+    description="Statistical Discovery",
+    prompt=STATISTICAL_AGENT_PROMPT,
+    subagent_type="general-purpose"
+)
+
+Task(
+    description="Semantic Discovery",
+    prompt=SEMANTIC_AGENT_PROMPT,
+    subagent_type="general-purpose"
+)
+
+Task(
+    description="Query Discovery",
+    prompt=QUERY_AGENT_PROMPT,
+    subagent_type="general-purpose"
+)
+```
+
+## MCP Tools Used
+
+The agents use these MCP tools for database analysis:
+
+- `list_schemas` - List all databases
+- `list_tables` - List tables in a schema
+- `describe_table` - Get table schema
+- `sample_rows` - Get sample data from table
+- `column_profile` - Get column statistics
+- `run_sql_readonly` - Execute read-only queries
+- `catalog_upsert` - Store findings in catalog
+- `catalog_list` / `catalog_get` - Retrieve findings from catalog
+
+## Benefits of Multi-Agent Approach
+
+1. **Parallel Execution**: All 4 agents run simultaneously
+2. **Specialized Expertise**: Each agent focuses on its domain
+3. **Cross-Validation**: Agents validate each other's findings
+4. **Comprehensive Coverage**: All aspects of database analyzed
+5. **Knowledge Synthesis**: Final report combines all perspectives
+
+## Output Format
+
+The system produces:
+
+1. **40+ Catalog Entries** - Detailed findings organized by agent
+2. **Comprehensive Report** - Executive summary with:
+   - Structure & Schema (ERD, table definitions)
+   - Business Domain (entity model, business rules)
+   - Key Insights (data quality, performance)
+   - Data Quality Assessment (score, recommendations)
+
+## Future Enhancements
+
+- [ ] Additional specialized agents (Security, Performance, Compliance)
+- [ ] Automated remediation scripts
+- [ ] Continuous monitoring mode
+- [ ] Integration with CI/CD pipelines
+- [ ] Web-based dashboard for findings
+
+## Related Files
+
+- `simple_discovery.py` - Simplified demo of multi-agent pattern
+- `mcp_catalog.db` - Catalog database for storing findings
+
+## References
+
+- Claude Code Task Tool Documentation
+- MCP (Model Context Protocol) Specification
+- ProxySQL MCP Server Implementation
diff --git a/lib/MySQL_Catalog.cpp b/lib/MySQL_Catalog.cpp
@@ -3,6 +3,7 @@
 #include "proxysql.h"
 #include <sstream>
 #include <algorithm>
+#include "../deps/json/json.hpp"
 
 MySQL_Catalog::MySQL_Catalog(const std::string& path)
 	: db(NULL), db_path(path)
@@ -220,31 +221,40 @@ std::string MySQL_Catalog::search(
 		return "[]";
 	}
 
-	// Build JSON result
-	std::ostringstream json;
-	json << "[";
-	bool first = true;
+	// Build JSON result using nlohmann::json
+	nlohmann::json results = nlohmann::json::array();
 
 	if (resultset) {
 		for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin();
 		     it != resultset->rows.end(); ++it) {
 			SQLite3_row* row = *it;
-			if (!first) json << ",";
-			first = false;
-
-			json << "{"
-			     << "\"kind\":\"" << (row->fields[0] ? row->fields[0] : "") << "\","
-			     << "\"key\":\"" << (row->fields[1] ? row->fields[1] : "") << "\","
-			     << "\"document\":" << (row->fields[2] ? row->fields[2] : "null") << ","
-			     << "\"tags\":\"" << (row->fields[3] ? row->fields[3] : "") << "\","
-			     << "\"links\":\"" << (row->fields[4] ? row->fields[4] : "") << "\""
-			     << "}";
+
+			nlohmann::json entry;
+			entry["kind"] = std::string(row->fields[0] ? row->fields[0] : "");
+			entry["key"] = std::string(row->fields[1] ? row->fields[1] : "");
+
+			// Parse the stored JSON document - nlohmann::json handles escaping
+			const char* doc_str = row->fields[2];
+			if (doc_str) {
+				try {
+					entry["document"] = nlohmann::json::parse(doc_str);
+				} catch (const nlohmann::json::parse_error& e) {
+					// If document is not valid JSON, store as string
+					entry["document"] = std::string(doc_str);
+				}
+			} else {
+				entry["document"] = nullptr;
+			}
+
+			entry["tags"] = std::string(row->fields[3] ? row->fields[3] : "");
+			entry["links"] = std::string(row->fields[4] ? row->fields[4] : "");
+
+			results.push_back(entry);
 		}
 		delete resultset;
 	}
 
-	json << "]";
-	return json.str();
+	return results.dump();
 }
 
 std::string MySQL_Catalog::list(
@@ -282,31 +292,42 @@ std::string MySQL_Catalog::list(
 	resultset = NULL;
 	db->execute_statement(sql.str().c_str(), &error, &cols, &affected, &resultset);
 
-	// Build JSON result with total count
-	std::ostringstream json;
-	json << "{\"total\":" << total << ",\"results\":[";
+	// Build JSON result using nlohmann::json
+	nlohmann::json result;
+	result["total"] = total;
+	nlohmann::json results = nlohmann::json::array();
 
-	bool first = true;
 	if (resultset) {
 		for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin();
 		     it != resultset->rows.end(); ++it) {
 			SQLite3_row* row = *it;
-			if (!first) json << ",";
-			first = false;
-
-			json << "{"
-			     << "\"kind\":\"" << (row->fields[0] ? row->fields[0] : "") << "\","
-			     << "\"key\":\"" << (row->fields[1] ? row->fields[1] : "") << "\","
-			     << "\"document\":" << (row->fields[2] ? row->fields[2] : "null") << ","
-			     << "\"tags\":\"" << (row->fields[3] ? row->fields[3] : "") << "\","
-			     << "\"links\":\"" << (row->fields[4] ? row->fields[4] : "") << "\""
-			     << "}";
+
+			nlohmann::json entry;
+			entry["kind"] = std::string(row->fields[0] ? row->fields[0] : "");
+			entry["key"] = std::string(row->fields[1] ? row->fields[1] : "");
+
+			// Parse the stored JSON document
+			const char* doc_str = row->fields[2];
+			if (doc_str) {
+				try {
+					entry["document"] = nlohmann::json::parse(doc_str);
+				} catch (const nlohmann::json::parse_error& e) {
+					entry["document"] = std::string(doc_str);
+				}
+			} else {
+				entry["document"] = nullptr;
+			}
+
+			entry["tags"] = std::string(row->fields[3] ? row->fields[3] : "");
+			entry["links"] = std::string(row->fields[4] ? row->fields[4] : "");
+
+			results.push_back(entry);
 		}
 		delete resultset;
 	}
 
-	json << "]}";
-	return json.str();
+	result["results"] = results;
+	return result.dump();
 }
 
 int MySQL_Catalog::merge(

diff --git a/lib/MySQL_Tool_Handler.cpp b/lib/MySQL_Tool_Handler.cpp
@@ -910,7 +910,13 @@ std::string MySQL_Tool_Handler::catalog_get(const std::string& kind, const std::
 	if (rc == 0) {
 		result["kind"] = kind;
 		result["key"] = key;
-		result["document"] = json::parse(document);
+		// Parse as raw JSON value to preserve nested structure
+		try {
+			result["document"] = json::parse(document);
+		} catch (const json::parse_error& e) {
+			// If not valid JSON, store as string
+			result["document"] = document;
+		}
 	} else {
 		result["error"] = "Entry not found";
 	}