Distributed Query Benchmark Report

This report is auto-generated by scripts/benchmark_distributed.sh.

How to Run

# Start the 2-shard MySQL demo
./scripts/start_sharding_demo.sh

# Start the single-backend baseline
./scripts/setup_single_backend.sh

# Build and run the full benchmark suite
make bench-distributed
./scripts/benchmark_distributed.sh

Or run the benchmark tool directly:

# 2-shard distributed
./bench_distributed \
    --backend "mysql://root:test@127.0.0.1:13306/testdb?name=shard1" \
    --backend "mysql://root:test@127.0.0.1:13307/testdb?name=shard2" \
    --shard "users:id:shard1,shard2" \
    --shard "orders:id:shard1,shard2"

# Single-backend baseline
./bench_distributed \
    --backend "mysql://root:test@127.0.0.1:13308/testdb?name=single" \
    --shard "users:id:single" \
    --shard "orders:id:single"

Pipeline Stages

Each query goes through 5 stages, each independently timed:

Parse -- Tokenize and build AST
Plan -- Convert AST to logical plan tree
Optimize -- Apply rewrite rules (predicate pushdown, constant folding, etc.)
Distribute -- Rewrite plan for multi-shard execution (RemoteScan, MergeSort, etc.)
Execute -- Run operators, fetch data from backends, merge results

Queries Benchmarked

#	Name	SQL	Description
1	full_scan	`SELECT * FROM users`	Scan all rows from both shards
2	filter_pushdown	`SELECT name, age, salary FROM users WHERE dept = 'Engineering'`	Filter pushed to both shards
3	distributed_agg	`SELECT dept, COUNT(*) FROM users GROUP BY dept`	Count by department, merged
4	sort_limit	`SELECT name, salary FROM users ORDER BY salary DESC LIMIT 3`	Top-3 via merge-sort
5	cross_shard_join	`SELECT u.name, o.total, o.status FROM users u JOIN orders o ON u.id = o.user_id`	Cross-shard join
6	expression_only	`SELECT 1 + 2, UPPER('distributed'), ...`	Pure expression, no backend
7	subquery	`SELECT name, age FROM users WHERE age > (SELECT AVG(age) FROM users)`	Correlated subquery

Comparison with Vitess

Vitess is Google's database clustering system for horizontal scaling of MySQL.

Feature	Our Engine	Vitess
Proxy layer	Single binary	vtgate + vttablet per shard
Query parsing	Custom zero-alloc C++ parser	sqlparser (Go)
Planning	Single-pass plan builder	vtgate planner (Gen4)
Optimization	Rule-based (4 rules)	Cost-based (Gen4)
Shard routing	ShardMap + hash-based	Vindexes (pluggable)
Cross-shard joins	Hash join + merge sort	Scatter-gather
Aggregation	MergeAggregate	Ordered aggregate on vtgate

For a direct Vitess comparison, set up their local example and run equivalent queries through their MySQL protocol endpoint (port 15306).

Results (2026-04-05)

Total latency per query (p50)

Query	2-shard (p50)	1-shard (p50)	Overhead
full_scan	371 us	190 us	1.95x
filter_pushdown	402 us	204 us	1.97x
distributed_agg	410 us	213 us	1.92x
sort_limit	402 us	193 us	2.08x
cross_shard_join	728 us	374 us	1.95x
expression_only	1.4 us	1.2 us	1.17x
subquery	4.33 ms	2.13 ms	2.03x

Pipeline breakdown (2-shard, avg)

Query	Parse	Plan	Optimize	Distribute	Execute
full_scan	1.1 us	419 ns	175 ns	656 ns	368 us
filter_pushdown	1.5 us	617 ns	358 ns	1.3 us	403 us
distributed_agg	1.6 us	523 ns	469 ns	1.3 us	416 us
sort_limit	1.5 us	575 ns	320 ns	905 ns	407 us
cross_shard_join	1.9 us	662 ns	403 ns	1.2 us	747 us
expression_only	516 ns	92 ns	433 ns	28 ns	312 ns
subquery	2.6 us	590 ns	800 ns	922 ns	4.40 ms

Key observations

Parse + Plan + Optimize + Distribute combined account for less than 1% of total latency for any query that touches a backend. The entire planning pipeline completes in under 5 us even for the most complex query (subquery).
Execute dominates at 99%+ of wall time for all backend queries. This is expected because network I/O to MySQL backends is the bottleneck.
2-shard overhead is ~2x for most queries, which is the expected cost of making two network round-trips instead of one. The engine fetches from both shards in parallel where possible.
Cross-shard join is ~2x the single-shard join because both tables must be fetched from two backends (4 total round-trips for users + orders).
Expression-only queries (no backend) complete in ~1.4 us, showing the raw overhead of the parse-plan-optimize-execute pipeline with zero I/O.
Subquery is the most expensive at 4.3ms due to multiple backend round-trips for the inner AVG query plus the outer filtered scan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Query Benchmark Report

How to Run

Pipeline Stages

Queries Benchmarked

Comparison with Vitess

Results (2026-04-05)

Total latency per query (p50)

Pipeline breakdown (2-shard, avg)

Key observations

FilesExpand file tree

distributed_comparison.md

Latest commit

History

distributed_comparison.md

File metadata and controls

Distributed Query Benchmark Report

How to Run

Pipeline Stages

Queries Benchmarked

Comparison with Vitess

Results (2026-04-05)

Total latency per query (p50)

Pipeline breakdown (2-shard, avg)

Key observations