Sort Merge Join: Reduce batch concatenation, use `BatchCoalescer`, new benchmarks (TPC-H Q21 SMJ up to ~4000x faster) by mbutrovich · Pull Request #18875 · apache/datafusion

mbutrovich · 2025-11-21T22:27:18Z

Which issue does this PR close?

Closes Sort Merge Join is extremely slow on LeftAnti joins #18487.
Will eventually close Performance regression after adding support for SMJ with join filter datafusion-comet#901.

Rationale for this change

DataFusion Comet often uses Sort Merge Joins because DataFusion does not have a larger-than-memory Hash Join operator. Performance on TPC-H Q21 is quite bad when run through native, and instead Comet falls back to Spark by default. If you force Comet to use DataFusion's SMJ operator, performance is:

Profiling showed most of the time spent in concat_batches of single-digit rows:

What changes are included in this PR?

Use a BatchCoalescer both internally and to buffer final output. There was also some redundant concatenation of batches for filtered joins. One made the biggest difference, but I switched to two to be consistent. Here are Comet results with the changes based on 50.3 (which is where Comet is):

TPC-H SF1 benchmark results are below (PREFER_HASH_JOIN=false ./bench.sh run tpch). I tried to run SF10 TPC-H but it seemed like it was going to take hours on my machine. It ran successfully on this PR.

./bench.sh compare_detail main smj        
Comparing main and smj
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Query        ┃                                           main ┃                               smj ┃           Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ QQuery 1     │                 44.37 / 48.67 ±4.54 / 55.68 ms │   41.63 / 55.88 ±19.83 / 95.24 ms │     1.15x slower │
│ QQuery 2     │                 45.18 / 47.44 ±2.39 / 51.74 ms │    45.26 / 47.29 ±3.56 / 54.39 ms │        no change │
│ QQuery 3     │                 52.59 / 56.15 ±2.65 / 59.79 ms │    50.93 / 52.39 ±1.35 / 54.46 ms │    +1.07x faster │
│ QQuery 4     │                 33.06 / 34.46 ±0.97 / 35.88 ms │    30.06 / 31.04 ±0.74 / 32.14 ms │    +1.11x faster │
│ QQuery 5     │                 84.50 / 87.63 ±2.06 / 90.58 ms │    78.33 / 80.62 ±2.96 / 86.32 ms │    +1.09x faster │
│ QQuery 6     │                 17.87 / 18.64 ±0.48 / 19.22 ms │    16.14 / 17.54 ±1.12 / 19.55 ms │    +1.06x faster │
│ QQuery 7     │              111.11 / 113.59 ±1.79 / 116.70 ms │ 112.43 / 115.85 ±2.55 / 118.96 ms │        no change │
│ QQuery 8     │                89.84 / 94.59 ±3.34 / 100.15 ms │    92.26 / 94.64 ±2.28 / 97.50 ms │        no change │
│ QQuery 9     │              128.36 / 133.12 ±3.46 / 138.00 ms │ 124.58 / 130.47 ±6.30 / 138.85 ms │        no change │
│ QQuery 10    │                 49.89 / 51.91 ±1.41 / 54.19 ms │    48.55 / 50.43 ±1.82 / 52.92 ms │        no change │
│ QQuery 11    │                 34.19 / 35.30 ±0.59 / 35.84 ms │    32.42 / 34.59 ±1.52 / 36.47 ms │        no change │
│ QQuery 12    │                 36.26 / 38.67 ±2.44 / 42.77 ms │    32.92 / 34.28 ±1.18 / 36.38 ms │    +1.13x faster │
│ QQuery 13    │                 31.32 / 34.13 ±2.29 / 38.22 ms │    28.66 / 29.84 ±1.11 / 31.94 ms │    +1.14x faster │
│ QQuery 14    │                 23.54 / 24.79 ±0.92 / 26.00 ms │    22.48 / 23.45 ±1.03 / 25.44 ms │    +1.06x faster │
│ QQuery 15    │                 26.66 / 27.47 ±0.86 / 29.05 ms │    26.23 / 28.64 ±1.72 / 31.48 ms │        no change │
│ QQuery 16    │                 17.63 / 18.94 ±0.97 / 20.20 ms │    16.82 / 18.11 ±1.33 / 20.60 ms │        no change │
│ QQuery 17    │                 94.36 / 96.41 ±1.62 / 98.44 ms │    91.47 / 93.47 ±1.70 / 96.54 ms │        no change │
│ QQuery 18    │               99.91 / 108.58 ±5.85 / 117.27 ms │ 104.25 / 106.40 ±2.42 / 110.47 ms │        no change │
│ QQuery 19    │                 35.23 / 36.68 ±1.46 / 39.23 ms │    32.98 / 36.03 ±1.88 / 38.57 ms │        no change │
│ QQuery 20    │                 40.66 / 41.84 ±1.20 / 44.05 ms │    38.12 / 39.20 ±0.92 / 40.45 ms │    +1.07x faster │
│ QQuery 21    │ 151142.04 / 246274.24 ±89682.07 / 358766.84 ms │ 216.09 / 218.73 ±2.03 / 221.31 ms │ +1125.94x faster │
│ QQuery 22    │                16.69 / 28.53 ±22.72 / 73.97 ms │    16.72 / 17.39 ±0.78 / 18.86 ms │    +1.64x faster │
└──────────────┴────────────────────────────────────────────────┴───────────────────────────────────┴──────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main)      │ 247451.79ms │
│ Total Time (smj)       │   1356.29ms │
│ Average Time (main)    │  11247.81ms │
│ Average Time (smj)     │     61.65ms │
│ Queries Faster         │          10 │
│ Queries Slower         │           1 │
│ Queries with No Change │          11 │
│ Queries with Failure   │           0 │
└────────────────────────┴─────────────┘

Are these changes tested?

Existing Sort Merge Join unit tests, added a new benchmark.

Are there any user-facing changes?

There should not be.

…hes on vector of RecordBatches. Add benchmarks, update tests.

comphead · 2025-11-21T22:31:17Z

+1168.11x faster

mbutrovich · 2025-11-21T22:45:00Z

I have a bug somewhere the extended tests demonstrate. I'll try to track it down next week.

# Conflicts: # Cargo.lock

mbutrovich · 2025-12-02T15:53:09Z

I think I sorted out the corner case failures by refactoring a bit. I basically removed direct member access to JoinedRecordBatches fields and encapsulated their logic in functions sprinkled with debug_assert to make more sense of the control flow. There were some redundant concat_batches in the existing logic to begin with that already improved performance, but the BatchCoalescer makes it even better.

alamb · 2025-12-03T16:52:28Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing smj (36a73e5) to 9af6858 diff using: tpch
Results will be posted here when complete

alamb · 2025-12-03T16:52:49Z

I started the following on this branch

PREFER_HASH_JOIN=false BENCHMARKS="tpch" ./gh_compare_branch.sh https://github.com/apache/datafusion/pull/18875

I think that will effectively test the merge join performance of main with this branch

Omega359 · 2025-12-03T17:55:57Z

This is what I get on my amd ryzen 9 machine:

$ ./bench.sh compare_detail upstream_main smj
Comparing upstream_main and smj
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query        ┃                                 upstream_main ┃                               smj ┃          Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 1     │              91.48 / 109.29 ±9.59 / 119.07 ms │   85.45 / 99.39 ±7.45 / 105.42 ms │   +1.10x faster │
│ QQuery 2     │                78.59 / 80.25 ±2.76 / 85.74 ms │    75.24 / 80.58 ±3.81 / 85.16 ms │       no change │
│ QQuery 3     │               89.52 / 93.74 ±3.71 / 100.51 ms │   91.79 / 94.31 ±3.17 / 100.53 ms │       no change │
│ QQuery 4     │                51.04 / 52.51 ±1.40 / 54.56 ms │    49.81 / 50.59 ±0.67 / 51.38 ms │       no change │
│ QQuery 5     │             151.19 / 154.89 ±4.39 / 163.06 ms │ 151.96 / 159.35 ±4.93 / 165.01 ms │       no change │
│ QQuery 6     │                23.67 / 29.73 ±3.32 / 32.81 ms │    25.87 / 30.65 ±2.59 / 32.65 ms │       no change │
│ QQuery 7     │             209.97 / 214.53 ±3.55 / 220.02 ms │ 213.94 / 219.05 ±7.28 / 233.48 ms │       no change │
│ QQuery 8     │             191.34 / 198.65 ±5.17 / 203.92 ms │ 189.82 / 197.02 ±4.05 / 201.31 ms │       no change │
│ QQuery 9     │             270.53 / 275.98 ±6.12 / 283.91 ms │ 272.41 / 280.72 ±4.92 / 286.95 ms │       no change │
│ QQuery 10    │               92.68 / 96.99 ±4.49 / 103.76 ms │   96.97 / 99.41 ±1.74 / 101.62 ms │       no change │
│ QQuery 11    │                60.48 / 63.12 ±2.32 / 66.97 ms │    60.53 / 63.75 ±1.85 / 65.76 ms │       no change │
│ QQuery 12    │                54.61 / 56.53 ±1.91 / 59.79 ms │    56.49 / 58.06 ±1.46 / 60.02 ms │       no change │
│ QQuery 13    │                48.51 / 51.06 ±1.72 / 53.27 ms │    48.90 / 51.44 ±1.63 / 53.71 ms │       no change │
│ QQuery 14    │                38.22 / 43.60 ±3.26 / 47.31 ms │    42.90 / 44.19 ±0.81 / 45.45 ms │       no change │
│ QQuery 15    │                47.58 / 53.90 ±4.17 / 59.49 ms │    53.77 / 55.22 ±1.27 / 57.09 ms │       no change │
│ QQuery 16    │                31.89 / 33.13 ±0.68 / 33.73 ms │    32.06 / 34.95 ±2.17 / 38.40 ms │    1.05x slower │
│ QQuery 17    │             213.95 / 215.98 ±2.13 / 219.65 ms │ 216.23 / 218.27 ±2.02 / 221.31 ms │       no change │
│ QQuery 18    │             203.46 / 208.98 ±3.20 / 212.19 ms │ 226.90 / 236.01 ±6.11 / 243.46 ms │    1.13x slower │
│ QQuery 19    │                67.12 / 69.08 ±1.12 / 70.18 ms │    68.34 / 71.18 ±1.80 / 73.83 ms │       no change │
│ QQuery 20    │                74.62 / 77.51 ±1.66 / 79.76 ms │    71.51 / 81.10 ±5.82 / 88.99 ms │       no change │
│ QQuery 21    │ 194460.40 / 199334.85 ±6297.51 / 211607.77 ms │ 345.81 / 351.86 ±4.31 / 359.12 ms │ +566.52x faster │
│ QQuery 22    │                28.62 / 33.55 ±6.70 / 46.71 ms │    27.89 / 29.55 ±1.20 / 31.03 ms │   +1.14x faster │
└──────────────┴───────────────────────────────────────────────┴───────────────────────────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary            ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (upstream_main)   │ 201547.85ms │
│ Total Time (smj)             │   2606.64ms │
│ Average Time (upstream_main) │   9161.27ms │
│ Average Time (smj)           │    118.48ms │
│ Queries Faster               │           3 │
│ Queries Slower               │           2 │
│ Queries with No Change       │          17 │
│ Queries with Failure         │           0 │
└──────────────────────────────┴─────────────┘

alamb · 2025-12-03T19:49:34Z

FWIW the benchmarks are still running because Q21 took over an hour to run 🤯

Query 21 iteration 0 took 4665980.8 ms and returned 100 rows

rluvaton · 2025-12-03T20:09:46Z

some of the debug_assert are very very cheap that I think we should do regular assert.
for example:

debug_assert_eq!(
        indices.len(),
        indices_len,
        "indices.len() should match indices_len parameter"
    );

mbutrovich · 2025-12-03T21:46:56Z

some of the debug_assert are very very cheap that I think we should do regular assert. for example:
debug_assert_eq!(
        indices.len(),
        indices_len,
        "indices.len() should match indices_len parameter"
    );

I might remove some. They were mostly to help me understand control flow as I was learning the SMJ state machine: I'd try to codify my understanding with debug_asserts as I went, and if I broke something or otherwise changed behavior that I was convinced was an invariant, I'd have good safeguards.

alamb · 2025-12-03T23:49:01Z

🤖: Benchmark completed

Details

Comparing HEAD and smj
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Query        ┃          HEAD ┃        smj ┃           Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ QQuery 1     │     230.23 ms │  227.23 ms │        no change │
│ QQuery 2     │     188.11 ms │  188.44 ms │        no change │
│ QQuery 3     │     244.92 ms │  249.80 ms │        no change │
│ QQuery 4     │     174.31 ms │  175.78 ms │        no change │
│ QQuery 5     │     408.15 ms │  402.66 ms │        no change │
│ QQuery 6     │      68.06 ms │   67.13 ms │        no change │
│ QQuery 7     │     488.35 ms │  502.39 ms │        no change │
│ QQuery 8     │     470.92 ms │  477.66 ms │        no change │
│ QQuery 9     │     682.45 ms │  684.11 ms │        no change │
│ QQuery 10    │     241.30 ms │  238.98 ms │        no change │
│ QQuery 11    │     171.28 ms │  168.36 ms │        no change │
│ QQuery 12    │     159.38 ms │  160.41 ms │        no change │
│ QQuery 13    │     264.82 ms │  265.14 ms │        no change │
│ QQuery 14    │      95.66 ms │   91.17 ms │        no change │
│ QQuery 15    │      99.71 ms │   98.49 ms │        no change │
│ QQuery 16    │      70.28 ms │   73.21 ms │        no change │
│ QQuery 17    │     504.18 ms │  501.81 ms │        no change │
│ QQuery 18    │     586.59 ms │  745.85 ms │     1.27x slower │
│ QQuery 19    │     138.05 ms │  152.12 ms │     1.10x slower │
│ QQuery 20    │     180.26 ms │  187.47 ms │        no change │
│ QQuery 21    │ 4422642.01 ms │ 1063.69 ms │ +4157.85x faster │
│ QQuery 22    │     104.78 ms │   99.21 ms │    +1.06x faster │
└──────────────┴───────────────┴────────────┴──────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Total Time (HEAD)      │ 4428213.77ms │
│ Total Time (smj)       │    6821.10ms │
│ Average Time (HEAD)    │  201282.44ms │
│ Average Time (smj)     │     310.05ms │
│ Queries Faster         │            2 │
│ Queries Slower         │            2 │
│ Queries with No Change │           18 │
│ Queries with Failure   │            0 │
└────────────────────────┴──────────────┘

mbutrovich · 2025-12-04T00:13:02Z

│ QQuery 21 │ 4422642.01 ms │ 1063.69 ms │ +4157.85x faster │```

My goodness.

comphead · 2025-12-04T02:48:47Z

Small batches are evil, sorry for delay, I wanted to check the PR with TPCDS but because of recent regression #19075 cannot merge it right now

rluvaton · 2025-12-04T06:58:46Z

could you please align with main, I just merged a PR that fixed bug in SMJ and updated fuzz tests

Fix: Align sort_merge_join filter output with join schema to fix right-anti panic #18800

datafusion/physical-plan/src/joins/sort_merge_join/stream.rs

Dandandan

Impressive 🚀

# Conflicts: # benchmarks/bench.sh # benchmarks/src/bin/dfbench.rs

…int the size.

alamb · 2025-12-09T13:32:08Z

🚀

comphead · 2025-12-09T16:38:58Z

TPCDS benches, sorry for being late

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       with ┃   without  ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │   35.39 ms │   34.56 ms │     no change │
│ QQuery 2     │  108.17 ms │  107.45 ms │     no change │
│ QQuery 3     │   95.07 ms │   92.13 ms │     no change │
│ QQuery 4     │  918.80 ms │  953.76 ms │     no change │
│ QQuery 5     │  136.87 ms │  135.88 ms │     no change │
│ QQuery 6     │  519.09 ms │  699.13 ms │  1.35x slower │
│ QQuery 7     │  270.91 ms │  261.45 ms │     no change │
│ QQuery 8     │   91.91 ms │   90.78 ms │     no change │
│ QQuery 9     │   87.32 ms │   90.48 ms │     no change │
│ QQuery 10    │   83.09 ms │   82.85 ms │     no change │
│ QQuery 11    │  587.29 ms │  613.38 ms │     no change │
│ QQuery 12    │   36.57 ms │   36.34 ms │     no change │
│ QQuery 13    │  303.08 ms │  303.84 ms │     no change │
│ QQuery 14    │  749.29 ms │  730.68 ms │     no change │
│ QQuery 15    │   20.97 ms │   11.14 ms │ +1.88x faster │
│ QQuery 16    │   30.28 ms │   29.88 ms │     no change │
│ QQuery 17    │  195.40 ms │  172.99 ms │ +1.13x faster │
│ QQuery 18    │  104.02 ms │   92.51 ms │ +1.12x faster │
│ QQuery 19    │  122.21 ms │  123.04 ms │     no change │
│ QQuery 20    │   10.36 ms │    9.81 ms │ +1.06x faster │
│ QQuery 21    │   14.47 ms │   13.45 ms │ +1.08x faster │
│ QQuery 22    │  407.98 ms │  402.95 ms │     no change │
│ QQuery 23    │  788.17 ms │  769.05 ms │     no change │
│ QQuery 24    │  331.00 ms │  307.41 ms │ +1.08x faster │
│ QQuery 25    │  288.60 ms │  260.48 ms │ +1.11x faster │
│ QQuery 26    │   74.37 ms │   62.38 ms │ +1.19x faster │
│ QQuery 27    │  266.95 ms │  257.78 ms │     no change │
│ QQuery 28    │  124.19 ms │  124.60 ms │     no change │
│ QQuery 29    │  236.50 ms │  216.35 ms │ +1.09x faster │
│ QQuery 30    │   32.71 ms │   32.04 ms │     no change │
│ QQuery 31    │  118.62 ms │  116.68 ms │     no change │
│ QQuery 32    │   46.17 ms │   44.91 ms │     no change │
│ QQuery 33    │  109.73 ms │  108.69 ms │     no change │
│ QQuery 34    │   85.35 ms │   82.13 ms │     no change │
│ QQuery 35    │   81.90 ms │   80.08 ms │     no change │
│ QQuery 36    │  164.28 ms │  164.69 ms │     no change │
│ QQuery 37    │  150.62 ms │  143.94 ms │     no change │
│ QQuery 38    │   60.51 ms │   60.23 ms │     no change │
│ QQuery 39    │   78.16 ms │   77.55 ms │     no change │
│ QQuery 40    │   85.87 ms │   87.97 ms │     no change │
│ QQuery 41    │    8.90 ms │    8.47 ms │     no change │
│ QQuery 42    │   87.33 ms │   87.46 ms │     no change │
│ QQuery 43    │   68.13 ms │   68.30 ms │     no change │
│ QQuery 44    │    7.87 ms │    7.67 ms │     no change │
│ QQuery 45    │   47.14 ms │   37.34 ms │ +1.26x faster │
│ QQuery 46    │  178.11 ms │  173.12 ms │     no change │
│ QQuery 47    │  525.12 ms │  525.53 ms │     no change │
│ QQuery 48    │  213.68 ms │  211.63 ms │     no change │
│ QQuery 49    │  194.12 ms │  194.98 ms │     no change │
│ QQuery 50    │  162.76 ms │  154.74 ms │     no change │
│ QQuery 51    │  130.71 ms │  128.29 ms │     no change │
│ QQuery 52    │   88.02 ms │   89.30 ms │     no change │
│ QQuery 53    │   84.55 ms │   83.86 ms │     no change │
│ QQuery 54    │  113.43 ms │  113.00 ms │     no change │
│ QQuery 55    │   87.71 ms │   86.97 ms │     no change │
│ QQuery 56    │  110.38 ms │  110.40 ms │     no change │
│ QQuery 57    │  140.65 ms │  139.42 ms │     no change │
│ QQuery 58    │  222.73 ms │  214.26 ms │     no change │
│ QQuery 59    │  132.77 ms │  131.62 ms │     no change │
│ QQuery 60    │  110.21 ms │  112.34 ms │     no change │
│ QQuery 61    │  142.82 ms │  136.83 ms │     no change │
│ QQuery 62    │  406.87 ms │  491.54 ms │  1.21x slower │
│ QQuery 63    │   86.25 ms │   83.74 ms │     no change │
│ QQuery 64    │  535.71 ms │  517.70 ms │     no change │
│ QQuery 65    │  187.66 ms │  189.66 ms │     no change │
│ QQuery 66    │  170.35 ms │  176.14 ms │     no change │
│ QQuery 67    │  260.98 ms │  261.36 ms │     no change │
│ QQuery 68    │  213.97 ms │  213.81 ms │     no change │
│ QQuery 69    │   83.22 ms │   82.90 ms │     no change │
│ QQuery 70    │  266.18 ms │  267.79 ms │     no change │
│ QQuery 71    │  108.12 ms │  105.49 ms │     no change │
│ QQuery 72    │  493.60 ms │  467.64 ms │ +1.06x faster │
│ QQuery 73    │   82.54 ms │   80.81 ms │     no change │
│ QQuery 74    │  367.18 ms │  363.52 ms │     no change │
│ QQuery 75    │  213.68 ms │  216.20 ms │     no change │
│ QQuery 76    │  110.79 ms │  105.19 ms │ +1.05x faster │
│ QQuery 77    │  149.31 ms │  149.22 ms │     no change │
│ QQuery 78    │  386.11 ms │  386.48 ms │     no change │
│ QQuery 79    │  173.55 ms │  174.38 ms │     no change │
│ QQuery 80    │  251.29 ms │  250.99 ms │     no change │
│ QQuery 81    │   19.91 ms │   20.35 ms │     no change │
│ QQuery 82    │  166.62 ms │  168.14 ms │     no change │
│ QQuery 83    │   37.67 ms │   29.41 ms │ +1.28x faster │
│ QQuery 84    │   37.38 ms │   35.53 ms │     no change │
│ QQuery 85    │  120.55 ms │  109.51 ms │ +1.10x faster │
│ QQuery 86    │   31.01 ms │   30.24 ms │     no change │
│ QQuery 87    │   60.35 ms │   60.85 ms │     no change │
│ QQuery 88    │   77.07 ms │   78.30 ms │     no change │
│ QQuery 89    │   96.91 ms │   98.38 ms │     no change │
│ QQuery 90    │   16.44 ms │   17.19 ms │     no change │
│ QQuery 91    │   46.71 ms │   45.57 ms │     no change │
│ QQuery 92    │   45.43 ms │   45.54 ms │     no change │
│ QQuery 93    │  136.89 ms │  137.16 ms │     no change │
│ QQuery 94    │   46.28 ms │   46.87 ms │     no change │
│ QQuery 95    │  112.53 ms │   95.73 ms │ +1.18x faster │
│ QQuery 96    │   56.71 ms │   57.81 ms │     no change │
│ QQuery 97    │   88.91 ms │   89.15 ms │     no change │
│ QQuery 98    │  127.11 ms │  127.86 ms │     no change │
│ QQuery 99    │ 4425.99 ms │ 5735.09 ms │  1.30x slower │
└──────────────┴────────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)      │ 21207.18ms │
│ Total Time (dev3)      │ 22514.21ms │
│ Average Time (main)    │   214.21ms │
│ Average Time (dev3)    │   227.42ms │
│ Queries Faster         │         15 │
│ Queries Slower         │          3 │
│ Queries with No Change │         81 │
│ Queries with Failure   │          0 │
└────────────────────────┴────────────┘

Use BatchCoaleser in sort merge join instead of calling coalesce_batc…

cd6433c

…hes on vector of RecordBatches. Add benchmarks, update tests.

github-actions bot added the physical-plan Changes to the physical-plan crate label Nov 21, 2025

Merge branch 'main' into smj

d29fd29

mbutrovich and others added 21 commits November 24, 2025 19:36

Merge branch 'main' into smj

c1b58b9

stash

a655212

Stash with assertions.

4ed5cd4

Stash with assertions.

4364656

encapsulate

7a41fe6

encapsulate

b986fd7

encapsulate

387c882

pre-refactor

efa2996

get rid of confusing output_size

a5c926f

refactor

f725308

refactor

4cc21e8

fix double concat for filtered joins

f6430db

more elided concats

32021cb

remove dead code

2e0f211

passes

37bb875

Merge branch 'main' into smj5

2ac80f6

# Conflicts: # Cargo.lock

comments

8c69056

clippy, comments

67877e6

Remove unused import

e7b94e5

optimize concat_batches call

7c55ad9

Merge branch 'main' into smj

ad583d2

mbutrovich marked this pull request as ready for review December 2, 2025 15:46

mbutrovich requested a review from comphead December 2, 2025 16:38

mbutrovich changed the title ~~Use BatchCoaleser in Sort Merge Join, new benchmarks~~ Use BatchCoaleser in Sort Merge Join, new benchmarks (TPC-H Q21 SMJ 1000x faster) Dec 2, 2025

Merge branch 'main' into smj

36a73e5

mbutrovich changed the title ~~Sort Merge Join: Reduce batch concatenation, use BatchCoalescer, new benchmarks (TPC-H Q21 SMJ ~1000x faster)~~ Sort Merge Join: Reduce batch concatenation, use BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~1000x faster) Dec 3, 2025

Merge branch 'main' into smj

1000afa

Dandandan reviewed Dec 4, 2025

View reviewed changes

datafusion/physical-plan/src/joins/sort_merge_join/stream.rs Outdated Show resolved Hide resolved

mbutrovich and others added 3 commits December 4, 2025 10:20

Address PR feedback.

66ea027

Merge branch 'main' into smj

eb5637e

Remove stray import.

86cbc5c

mbutrovich changed the title ~~Sort Merge Join: Reduce batch concatenation, use BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~1000x faster)~~ Sort Merge Join: Reduce batch concatenation, use BatchCoalescer, new benchmarks (TPC-H Q21 SMJ up to ~4000x faster) Dec 4, 2025

Dandandan approved these changes Dec 8, 2025

View reviewed changes

mbutrovich and others added 3 commits December 8, 2025 11:48

Merge branch 'main' into smj

2ce09f1

# Conflicts: # benchmarks/bench.sh # benchmarks/src/bin/dfbench.rs

We're spending a ton of time resizing in append_output_pair. Try to h…

8ec9e92

…int the size.

Merge branch 'main' into smj

2389fd6

comphead mentioned this pull request Dec 8, 2025

Push down InList or hash table references from HashJoinExec depending on the size of the build side #18393

Merged

mbutrovich added this pull request to the merge queue Dec 9, 2025

Merged via the queue into apache:main with commit 7ea5066 Dec 9, 2025
31 checks passed

mbutrovich deleted the smj branch December 9, 2025 16:48

mbutrovich mentioned this pull request Jan 12, 2026

DataFusion 52 release post apache/datafusion-site#135

Merged

Conversation

mbutrovich commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

comphead commented Nov 21, 2025

Uh oh!

mbutrovich commented Nov 21, 2025

Uh oh!

mbutrovich commented Dec 2, 2025

Uh oh!

alamb commented Dec 3, 2025

Uh oh!

alamb commented Dec 3, 2025

Uh oh!

Omega359 commented Dec 3, 2025

Uh oh!

alamb commented Dec 3, 2025

Uh oh!

rluvaton commented Dec 3, 2025

Uh oh!

mbutrovich commented Dec 3, 2025

Uh oh!

alamb commented Dec 3, 2025

Uh oh!

mbutrovich commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comphead commented Dec 4, 2025

Uh oh!

rluvaton commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Dec 9, 2025

Uh oh!

comphead commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mbutrovich commented Nov 21, 2025 •

edited

Loading

mbutrovich commented Dec 4, 2025 •

edited

Loading

rluvaton commented Dec 4, 2025 •

edited

Loading