forked from sysown/proxysql
-
Notifications
You must be signed in to change notification settings - Fork 0
Multi-Agent Database Discovery System using Claude Code #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f852900
Fix: Correct MCP catalog JSON parsing to handle special characters
renecannao 14de472
Add multi-agent database discovery system
renecannao d73ce0c
Add headless database discovery scripts
renecannao b627f83
Refactor: Reorganize headless discovery scripts to dedicated directory
renecannao fdee58a
Add comprehensive database discovery outputs and enhance headless dis…
renecannao 6dd2613
Move discovery docs to examples directory
renecannao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,246 @@ | ||
| # Multi-Agent Database Discovery System | ||
|
|
||
| ## Overview | ||
|
|
||
| This document describes a multi-agent database discovery system implemented using Claude Code's autonomous agent capabilities. The system uses 4 specialized subagents that collaborate via the MCP (Model Context Protocol) catalog to perform comprehensive database analysis. | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| ┌─────────────────────────────────────────────────────────────────────┐ | ||
| │ Main Agent (Orchestrator) │ | ||
| │ - Launches 4 specialized subagents in parallel │ | ||
| │ - Coordinates via MCP catalog │ | ||
| │ - Synthesizes final report │ | ||
| └────────────────┬────────────────────────────────────────────────────┘ | ||
| │ | ||
| ┌────────────┼────────────┬────────────┬────────────┐ | ||
| │ │ │ │ │ | ||
| ▼ ▼ ▼ ▼ ▼ | ||
| ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ | ||
| │Struct. │ │Statist.│ │Semantic│ │Query │ │ MCP │ | ||
| │ Agent │ │ Agent │ │ Agent │ │ Agent │ │Catalog │ | ||
| └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ | ||
| │ │ │ │ │ | ||
| └────────────┴────────────┴────────────┴────────────┘ | ||
| │ | ||
| ▼ ▼ | ||
| ┌─────────┐ ┌─────────────┐ | ||
| │ Database│ │ Catalog │ | ||
| │ (testdb)│ │ (Shared Mem)│ | ||
| └─────────┘ └─────────────┘ | ||
| ``` | ||
|
|
||
| ## The Four Discovery Agents | ||
|
|
||
| ### 1. Structural Agent | ||
| **Mission**: Map tables, relationships, indexes, and constraints | ||
|
|
||
| **Responsibilities**: | ||
| - Complete ERD documentation | ||
| - Table schema analysis (columns, types, constraints) | ||
| - Foreign key relationship mapping | ||
| - Index inventory and assessment | ||
| - Architectural pattern identification | ||
|
|
||
| **Catalog Entries**: `structural_discovery` | ||
|
|
||
| **Key Deliverables**: | ||
| - Entity Relationship Diagram | ||
| - Complete table definitions | ||
| - Index inventory with recommendations | ||
| - Relationship cardinality mapping | ||
|
|
||
| ### 2. Statistical Agent | ||
| **Mission**: Profile data distributions, patterns, and anomalies | ||
|
|
||
| **Responsibilities**: | ||
| - Table row counts and cardinality analysis | ||
| - Data distribution profiling | ||
| - Anomaly detection (duplicates, outliers) | ||
| - Statistical summaries (min/max/avg/stddev) | ||
| - Business metrics calculation | ||
|
|
||
| **Catalog Entries**: `statistical_discovery` | ||
|
|
||
| **Key Deliverables**: | ||
| - Data quality score | ||
| - Duplicate detection reports | ||
| - Statistical distributions | ||
| - True vs inflated metrics | ||
|
|
||
| ### 3. Semantic Agent | ||
| **Mission**: Infer business domain and entity types | ||
|
|
||
| **Responsibilities**: | ||
| - Business domain identification | ||
| - Entity type classification (master vs transactional) | ||
| - Business rule discovery | ||
| - Entity lifecycle analysis | ||
| - State machine identification | ||
|
|
||
| **Catalog Entries**: `semantic_discovery` | ||
|
|
||
| **Key Deliverables**: | ||
| - Complete domain model | ||
| - Business rules documentation | ||
| - Entity lifecycle definitions | ||
| - Missing capabilities identification | ||
|
|
||
| ### 4. Query Agent | ||
| **Mission**: Analyze access patterns and optimization opportunities | ||
|
|
||
| **Responsibilities**: | ||
| - Query pattern identification | ||
| - Index usage analysis | ||
| - Performance bottleneck detection | ||
| - N+1 query risk assessment | ||
| - Optimization recommendations | ||
|
|
||
| **Catalog Entries**: `query_discovery` | ||
|
|
||
| **Key Deliverables**: | ||
| - Access pattern analysis | ||
| - Index recommendations (prioritized) | ||
| - Query optimization strategies | ||
| - EXPLAIN analysis results | ||
|
|
||
| ## Discovery Process | ||
|
|
||
| ### Round Structure | ||
|
|
||
| Each agent runs 4 rounds of analysis: | ||
|
|
||
| #### Round 1: Blind Exploration | ||
| - Initial schema/data analysis | ||
| - First observations cataloged | ||
| - Initial hypotheses formed | ||
|
|
||
| #### Round 2: Pattern Recognition | ||
| - Read other agents' findings from catalog | ||
| - Identify patterns and anomalies | ||
| - Form and test hypotheses | ||
|
|
||
| #### Round 3: Hypothesis Testing | ||
| - Validate business rules against actual data | ||
| - Cross-reference findings with other agents | ||
| - Confirm or reject hypotheses | ||
|
|
||
| #### Round 4: Final Synthesis | ||
| - Compile comprehensive findings | ||
| - Generate actionable recommendations | ||
| - Create final mission summary | ||
|
|
||
| ### Catalog-Based Collaboration | ||
|
|
||
| ```python | ||
| # Agent writes findings | ||
| catalog_upsert( | ||
| kind="structural_discovery", | ||
| key="table_customers", | ||
| document="...", | ||
| tags="structural,table,schema" | ||
| ) | ||
|
|
||
| # Agent reads other agents' findings | ||
| findings = catalog_list(kind="statistical_discovery") | ||
| ``` | ||
|
|
||
| ## Example Discovery Output | ||
|
|
||
| ### Database: testdb (E-commerce Order Management) | ||
|
|
||
| #### True Statistics (After Deduplication) | ||
| | Metric | Current | Actual | | ||
| |--------|---------|--------| | ||
| | Customers | 15 | 5 | | ||
| | Products | 15 | 5 | | ||
| | Orders | 15 | 5 | | ||
| | Order Items | 27 | 9 | | ||
| | Revenue | $10,886.67 | $3,628.85 | | ||
|
|
||
| #### Critical Findings | ||
| 1. **Data Quality**: 5/100 (Catastrophic) - 67% data triplication | ||
| 2. **Missing Index**: orders.order_date (P0 critical) | ||
| 3. **Missing Constraints**: No UNIQUE or FK constraints | ||
| 4. **Business Domain**: E-commerce order management system | ||
|
|
||
| ## Launching the Discovery System | ||
|
|
||
| ```python | ||
| # In Claude Code, launch 4 agents in parallel: | ||
| Task( | ||
| description="Structural Discovery", | ||
| prompt=STRUCTURAL_AGENT_PROMPT, | ||
| subagent_type="general-purpose" | ||
| ) | ||
|
|
||
| Task( | ||
| description="Statistical Discovery", | ||
| prompt=STATISTICAL_AGENT_PROMPT, | ||
| subagent_type="general-purpose" | ||
| ) | ||
|
|
||
| Task( | ||
| description="Semantic Discovery", | ||
| prompt=SEMANTIC_AGENT_PROMPT, | ||
| subagent_type="general-purpose" | ||
| ) | ||
|
|
||
| Task( | ||
| description="Query Discovery", | ||
| prompt=QUERY_AGENT_PROMPT, | ||
| subagent_type="general-purpose" | ||
| ) | ||
| ``` | ||
|
|
||
| ## MCP Tools Used | ||
|
|
||
| The agents use these MCP tools for database analysis: | ||
|
|
||
| - `list_schemas` - List all databases | ||
| - `list_tables` - List tables in a schema | ||
| - `describe_table` - Get table schema | ||
| - `sample_rows` - Get sample data from table | ||
| - `column_profile` - Get column statistics | ||
| - `run_sql_readonly` - Execute read-only queries | ||
| - `catalog_upsert` - Store findings in catalog | ||
| - `catalog_list` / `catalog_get` - Retrieve findings from catalog | ||
|
|
||
| ## Benefits of Multi-Agent Approach | ||
|
|
||
| 1. **Parallel Execution**: All 4 agents run simultaneously | ||
| 2. **Specialized Expertise**: Each agent focuses on its domain | ||
| 3. **Cross-Validation**: Agents validate each other's findings | ||
| 4. **Comprehensive Coverage**: All aspects of database analyzed | ||
| 5. **Knowledge Synthesis**: Final report combines all perspectives | ||
|
|
||
| ## Output Format | ||
|
|
||
| The system produces: | ||
|
|
||
| 1. **40+ Catalog Entries** - Detailed findings organized by agent | ||
| 2. **Comprehensive Report** - Executive summary with: | ||
| - Structure & Schema (ERD, table definitions) | ||
| - Business Domain (entity model, business rules) | ||
| - Key Insights (data quality, performance) | ||
| - Data Quality Assessment (score, recommendations) | ||
|
|
||
| ## Future Enhancements | ||
|
|
||
| - [ ] Additional specialized agents (Security, Performance, Compliance) | ||
| - [ ] Automated remediation scripts | ||
| - [ ] Continuous monitoring mode | ||
| - [ ] Integration with CI/CD pipelines | ||
| - [ ] Web-based dashboard for findings | ||
|
|
||
| ## Related Files | ||
|
|
||
| - `simple_discovery.py` - Simplified demo of multi-agent pattern | ||
| - `mcp_catalog.db` - Catalog database for storing findings | ||
|
|
||
| ## References | ||
|
|
||
| - Claude Code Task Tool Documentation | ||
| - MCP (Model Context Protocol) Specification | ||
| - ProxySQL MCP Server Implementation | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,7 @@ | |
| #include "proxysql.h" | ||
| #include <sstream> | ||
| #include <algorithm> | ||
| #include "../deps/json/json.hpp" | ||
|
|
||
| MySQL_Catalog::MySQL_Catalog(const std::string& path) | ||
| : db(NULL), db_path(path) | ||
|
|
@@ -220,31 +221,40 @@ std::string MySQL_Catalog::search( | |
| return "[]"; | ||
| } | ||
|
|
||
| // Build JSON result | ||
| std::ostringstream json; | ||
| json << "["; | ||
| bool first = true; | ||
| // Build JSON result using nlohmann::json | ||
| nlohmann::json results = nlohmann::json::array(); | ||
|
|
||
| if (resultset) { | ||
| for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); | ||
| it != resultset->rows.end(); ++it) { | ||
|
Comment on lines
228
to
229
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| SQLite3_row* row = *it; | ||
| if (!first) json << ","; | ||
| first = false; | ||
|
|
||
| json << "{" | ||
| << "\"kind\":\"" << (row->fields[0] ? row->fields[0] : "") << "\"," | ||
| << "\"key\":\"" << (row->fields[1] ? row->fields[1] : "") << "\"," | ||
| << "\"document\":" << (row->fields[2] ? row->fields[2] : "null") << "," | ||
| << "\"tags\":\"" << (row->fields[3] ? row->fields[3] : "") << "\"," | ||
| << "\"links\":\"" << (row->fields[4] ? row->fields[4] : "") << "\"" | ||
| << "}"; | ||
|
|
||
| nlohmann::json entry; | ||
| entry["kind"] = std::string(row->fields[0] ? row->fields[0] : ""); | ||
| entry["key"] = std::string(row->fields[1] ? row->fields[1] : ""); | ||
|
|
||
| // Parse the stored JSON document - nlohmann::json handles escaping | ||
| const char* doc_str = row->fields[2]; | ||
| if (doc_str) { | ||
| try { | ||
| entry["document"] = nlohmann::json::parse(doc_str); | ||
| } catch (const nlohmann::json::parse_error& e) { | ||
| // If document is not valid JSON, store as string | ||
| entry["document"] = std::string(doc_str); | ||
| } | ||
| } else { | ||
| entry["document"] = nullptr; | ||
| } | ||
|
|
||
| entry["tags"] = std::string(row->fields[3] ? row->fields[3] : ""); | ||
| entry["links"] = std::string(row->fields[4] ? row->fields[4] : ""); | ||
|
|
||
| results.push_back(entry); | ||
| } | ||
| delete resultset; | ||
| } | ||
|
|
||
| json << "]"; | ||
| return json.str(); | ||
| return results.dump(); | ||
| } | ||
|
|
||
| std::string MySQL_Catalog::list( | ||
|
|
@@ -282,31 +292,42 @@ std::string MySQL_Catalog::list( | |
| resultset = NULL; | ||
| db->execute_statement(sql.str().c_str(), &error, &cols, &affected, &resultset); | ||
|
|
||
| // Build JSON result with total count | ||
| std::ostringstream json; | ||
| json << "{\"total\":" << total << ",\"results\":["; | ||
| // Build JSON result using nlohmann::json | ||
| nlohmann::json result; | ||
| result["total"] = total; | ||
| nlohmann::json results = nlohmann::json::array(); | ||
|
|
||
| bool first = true; | ||
| if (resultset) { | ||
| for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); | ||
| it != resultset->rows.end(); ++it) { | ||
|
Comment on lines
301
to
302
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| SQLite3_row* row = *it; | ||
| if (!first) json << ","; | ||
| first = false; | ||
|
|
||
| json << "{" | ||
| << "\"kind\":\"" << (row->fields[0] ? row->fields[0] : "") << "\"," | ||
| << "\"key\":\"" << (row->fields[1] ? row->fields[1] : "") << "\"," | ||
| << "\"document\":" << (row->fields[2] ? row->fields[2] : "null") << "," | ||
| << "\"tags\":\"" << (row->fields[3] ? row->fields[3] : "") << "\"," | ||
| << "\"links\":\"" << (row->fields[4] ? row->fields[4] : "") << "\"" | ||
| << "}"; | ||
|
|
||
| nlohmann::json entry; | ||
| entry["kind"] = std::string(row->fields[0] ? row->fields[0] : ""); | ||
| entry["key"] = std::string(row->fields[1] ? row->fields[1] : ""); | ||
|
|
||
| // Parse the stored JSON document | ||
| const char* doc_str = row->fields[2]; | ||
| if (doc_str) { | ||
| try { | ||
| entry["document"] = nlohmann::json::parse(doc_str); | ||
| } catch (const nlohmann::json::parse_error& e) { | ||
| entry["document"] = std::string(doc_str); | ||
| } | ||
| } else { | ||
| entry["document"] = nullptr; | ||
| } | ||
|
|
||
| entry["tags"] = std::string(row->fields[3] ? row->fields[3] : ""); | ||
| entry["links"] = std::string(row->fields[4] ? row->fields[4] : ""); | ||
|
|
||
| results.push_back(entry); | ||
| } | ||
| delete resultset; | ||
| } | ||
|
|
||
| json << "]}"; | ||
| return json.str(); | ||
| result["results"] = results; | ||
| return result.dump(); | ||
| } | ||
|
|
||
| int MySQL_Catalog::merge( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The architecture diagram is a bit confusing regarding the catalog. It shows an 'MCP Catalog' box at the same level as the agents, and also a 'Catalog (Shared Mem)' box at the bottom that the agents interact with. This suggests there might be two different catalogs. Could you clarify if these are the same entity and perhaps simplify the diagram to show a single, shared catalog that all agents communicate with via MCP?