[SPARK-37019][SQL] Add codegen support to array higher-order functions by Kimahriman · Pull Request #34558 · apache/spark

Kimahriman · 2021-11-11T13:55:00Z

What changes were proposed in this pull request?

This PR adds codegen support to array based higher order functions except ArraySort. This is my first time playing around with codegen, so definitely looking for any feedback.

A few notes:

Disabled subexpression elimination for lambda functions (this already was the case because it was CodegenFallback). I plan to explore supprting subexpression elimination inside lambda functions later on, as it will require special handling.
I set the AtomicReference for all lambda values as well in case a child expression reverts to interpreted evaluation for any reason (CodegenFallback or otherwise)

Why are the changes needed?

To improve performance of array higher-order function operations, letting the children be codegen'd and participate in WholeStageCodegen

Does this PR introduce any user-facing change?

No, only performance improvements.

How was this patch tested?

Existing unit tests, let me know if there's other codegen-specific unit tests I should add.

AmplabJenkins · 2021-11-11T14:08:36Z

Can one of the admins verify this patch?

Kimahriman · 2022-01-02T14:50:13Z

@viirya @HyukjinKwon @cloud-fan any thoughts or know who might have thoughts?

Tagar · 2022-03-15T05:31:46Z

@Kimahriman just out of curiosity, how much did the performance improve?

Kimahriman · 2022-03-15T11:54:15Z

It's hard to say because when I tested this out on my production jobs (actually still actively using it), I had several other changes too. I'm not sure if there are any benchmarks involving HOFs? Though it's highly dependent on what the lambda function is, and honestly that's one of the main benefits, the lambda functions themselves can be codegen'd instead of eval'd.

I also have a larger goal to support subexpression elimination inside lambda functions, because that's where I've found our biggest problem is. #34727 is also part of that goal.

jaceklaskowski

There seems to be a lot of repetition. Wish it could be avoided somehow but can't help though (beside nit-picking).

Kimahriman · 2023-04-04T20:44:54Z

There seems to be a lot of repetition. Wish it could be avoided somehow but can't help though (beside nit-picking).

Thanks for the review! I tried to get as much common code in the parent classes as I could, can take another pass to see if anything jumps out for deduping

chris-twiner · 2025-03-13T18:02:23Z

@Kimahriman just out of curiosity, how much did the performance improve?

I just wanted to add to the above response that I've implemented a compilation scheme here, as part of Quality, and we saw perf boosts of up to 40%, after that adding further lambdas triggered the cost of code generation being higher than the saving. It's definitely usage dependant though, the more work done in the function the higher the cost (and therefore potential saving by compilation), a small boost is noticeable on removal of the atomic under similar ideal circumstances.
edit - the source

Not sure how I missed this comment. we haven't done extensive performance comparisons with and without this, we've just been using it for a few years now. It's hard to quantify the impact since it's completely dependent on the expressions run inside the functions. But that's also the whole point, by enabling codegen for HOFs you enable codegen for expressions inside the lambda functions, which are assumed to be more performant since that's the whole point of codegen.

Additionally this enables a follow on I'm currently working on which is enabling subexpression elimination inside of lambda functions, which we've recently identified as a major performance killer for us, as it's very easy to generate a lot of duplicate expression evaluations in certain cases

The subexpression elimination option is huge! Very exciting

Kimahriman · 2025-03-19T11:17:53Z

I added a simple benchmark, local results:

[info] Running benchmark: transform
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 4605 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 16947 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] transform:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             371            461          54         26.9          37.1       1.0X
[info] interpreted                                        1615           1695          86          6.2         161.5       0.2X
[info] Running benchmark: filter
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 4042 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 16382 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] filter:                                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             360            404          45         27.8          36.0       1.0X
[info] interpreted                                        1589           1638          53          6.3         158.9       0.2X
[info] Running benchmark: forall - fast
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 2129 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 7804 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] forall - fast:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             199            213           9         50.4          19.9       1.0X
[info] interpreted                                         749            780          24         13.4          74.9       0.3X
[info] Running benchmark: forall - slow
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 2570 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 8253 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] forall - slow:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             240            257          17         41.7          24.0       1.0X
[info] interpreted                                         801            825          23         12.5          80.1       0.3X
[info] Running benchmark: exists - fast
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 2061 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 8190 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] exists - fast:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             188            206          21         53.2          18.8       1.0X
[info] interpreted                                         794            819          25         12.6          79.4       0.2X
[info] Running benchmark: exists - slow
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 2441 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 8876 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] exists - slow:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             232            244          14         43.1          23.2       1.0X
[info] interpreted                                         857            888          32         11.7          85.7       0.3X
[info] Running benchmark: aggregate
[info]   Running case: codegen
[info]   Stopped after 10 iterations, 4029 ms
[info]   Running case: interpreted
[info]   Stopped after 10 iterations, 12054 ms
[info] OpenJDK 64-Bit Server VM 17.0.12+0 on Mac OS X 14.7.4
[info] Apple M1 Max
[info] aggregate:                                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] codegen                                             381            403          18         26.3          38.1       1.0X
[info] interpreted                                        1157           1205          34          8.6         115.7       0.3X

Kimahriman · 2025-06-24T16:07:41Z

The subexpression elimination option is huge! Very exciting

#51272

LuciferYang · 2026-03-18T02:56:17Z

@dongjoon-hyun This was submitted earlier than the one at #54864. We can switch to discussing it here instead.

sunchao · 2026-05-22T03:31:18Z

+          mergeCode.isNull, merge.nullable)
+
+        val finishAssignment = assignVar(accForFinishCode, finishAtomic, accForMergeCode.value,
+          accForMergeCode.isNull, merge.nullable)


finishAssignment is using merge.nullable to decide whether to propagate the accumulator null bit into the finish lambda. That loses a nullable zero on empty arrays when merge itself is non-nullable. For example, with aggregate(array(), CAST(NULL AS INT), (acc, x) -> coalesce(acc, 0) + x, acc -> acc IS NULL), interpreted eval passes NULL into finish and returns true, but the generated path leaves accForFinishCode.isNull at its default false and can return false. This should follow the accumulator nullability here, not merge.nullable.

[ 🤖 posted by Codex on behalf of sunchao using the code-review-for-me skill 🤖 ]

Thanks, good catch. Fixed in 9f330488952 by making finishAssignment follow the accumulator lambda variable nullability instead of merge.nullable, so a nullable zero is still propagated into finish when the array is empty.

I also added the regression case from the comment and verified it with:

build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite -- -z ArrayAggregate'

Result: passed (2 tests, 0 failures).

sunchao · 2026-05-23T15:13:33Z

+            |  $varAssignments
+            |  ${functionCode.code}
+            |  $resultAssignment
+            |  if ((boolean)${functionCode.value}) {


[P1] Preserve null predicate semantics in generated ArrayFilter

filter treats a null predicate result as false, but this generated loop reads ${functionCode.value} without checking ${functionCode.isNull}. I reproduced this on the current head with a vectorized Parquet array column containing [true] and [null]: with CODEGEN_ONLY, filter(a, x -> x) returns [true] and [null] instead of [true] and []; with NO_CODEGEN, it returns the expected results. Please use a predicate equivalent to !functionCode.isNull && functionCode.value for both the tracker and count, and add a generated-code regression test.

[ 🤖 posted by Codex on behalf of sunchao using the code-review-for-me skill 🤖 ]

Fixed in eecc672cb44.

Generated ArrayFilter now computes a local keep predicate as !functionCode.isNull && functionCode.value and uses that value for both the tracker array and the retained element count. I added a Catalyst regression with custom GenericArrayData that keeps isNullAt=true while exposing a stale typed boolean value, which fails against the old generated path.

Verified with:

build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite -- -z "generated code"'

Result: passed (2 tests, 0 failures).

sunchao · 2026-05-23T15:13:37Z

  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
    case _: CodegenFallback => Nil
    case c: ConditionalExpression => c.alwaysEvaluatedInputs.map(skipForShortcut)
+    case h: HigherOrderFunction => h.arguments


[P1] Do not eagerly evaluate ArrayAggregate.zero for null input arrays

For ArrayAggregate, zero is not evaluated when the array argument is null, but recursing into all h.arguments allows subexpression elimination to generate expressions from zero outside that null guard. On the current head, with ANSI mode, CSE, and generated code enabled, aggregate(CAST(NULL AS ARRAY<INT>), (cast(id as int) / 0) + (cast(id as int) / 0), (acc, x) -> acc + x) throws DIVIDE_BY_ZERO; with CSE disabled, the same generated query returns null. Please recurse only into the always-evaluated input array for ArrayAggregate and add this regression. This is separate from the empty-array nullable-accumulator fix in 9f330488952.

[ 🤖 posted by Codex on behalf of sunchao using the code-review-for-me skill 🤖 ]

Fixed in eecc672cb44.

I changed HOF CSE recursion to use an alwaysEvaluatedArguments hook. By default it returns arguments, but ArrayAggregate overrides it to return only the input array argument, so zero is no longer made CSE-visible before the null-array guard.

I added the ANSI/CSE regression from this comment and verified it fails with the old h.arguments recursion, then passes with the fix.

Verified with:

build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "aggregate function - null array does not evaluate zero expression through CSE"'

Result: passed (1 test, 0 failures).

sunchao · 2026-05-23T15:13:42Z

+          s"InternalRow.copyValue(${functionCode.value})"
+        }
+        val resultNull = if (function.nullable) Some(functionCode.isNull.toString) else None
+        val resultAssignment = CodeGenerator.setArrayElement(arrayData, dataType.elementType,


[P1] Preserve null complex elements in generated array HOF output

This new output-writing path passes a null bit for nullable lambda results into CodeGenerator.setArrayElement, but that helper applies setNullAt only for primitive element types. I reproduced this on the current head with native vectorized ORC input containing array(CAST(NULL AS ARRAY<INT>), array(1)): the input reads back with a[0] IS NULL = true, and interpreted transform/filter return [true, true], but generated transform(a, x -> x)[0] IS NULL and filter(a, x -> x IS NULL)[0] IS NULL return [false, false]. Please make nullable non-primitive array writes honor isNull and add a generated nested-null regression test.

[ 🤖 posted by Codex on behalf of sunchao using the code-review-for-me skill 🤖 ]

Fixed in eecc672cb44.

CodeGenerator.setArrayElement now honors the provided isNull bit for non-primitive element types as well, so nullable complex array writes call setNullAt instead of writing a stale complex value.

I moved the regression to Catalyst with custom GenericArrayData that simulates the vectorized-reader shape: isNullAt(0) reports null while the typed complex getter can still expose a stale nested value. The test covers both generated transform output and filter output.

Verified with:

build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite -- -z "generated code"'

Result: passed (2 tests, 0 failures).

sunchao · 2026-05-24T01:12:05Z

+          zeroCode.isNull, zero.nullable)
+
+        val mergeAssignment = assignVar(accForMergeCode, mergeAtomic, mergeCopy,
+          mergeCode.isNull, merge.nullable)


[P1] Clear ArrayAggregate accumulator null state on every generated assignment

accForMergeVar is always bound nullable in bindInternal, and withLambdaVars stores its isNull flag as mutable generated state. These assignments only update that flag when the source expression is nullable, which produces wrong results in both cases below:

Within one row, aggregate(array(CAST(id AS INT) + 1, CAST(id AS INT) + 2), CAST(NULL AS INT), (acc, x) -> coalesce(acc, 0) + x, acc -> coalesce(acc, -1)) returns -1 with CODEGEN_ONLY instead of 3; the same query returns 3 with NO_CODEGEN.

Across rows in one generated partition, after a row whose merge becomes null, an empty-array row with zero 0 returns -1 instead of 0 with CODEGEN_ONLY; the interpreted path returns 0.

Please pass accForMergeVar.nullable to both initialAssignment and mergeAssignment and add generated-code regressions for these paths.

[ 🤖 posted by Codex on behalf of sunchao using the code-review-for-me skill 🤖 ]

Fixed in 261147f456d.

initialAssignment and mergeAssignment now both use accForMergeVar.nullable, so the generated accumulator lambda variable null flag is updated on every assignment. That clears the stale-null state both after assigning a non-null zero and after a non-null merge result.

I added generated-code regressions for both paths from the comment:

stale null state within a single aggregate loop

stale null state carried across rows in the same generated partition

I verified both tests fail before the fix and pass after it.

Verified with:

build/sbt 'sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z "aggregate function - generated code clears accumulator null state"'

Result: passed (2 tests, 0 failures).

sunchao · 2026-05-25T19:20:03Z

+        val initialization = CodeGenerator.createArrayData(
+          arrayData, dataType.elementType, numElements, s" $prettyName failed.")
+
+        val functionCode = function.genCode(ctx)


[P2] Rebind fallback lambda variables before code generation

The generated path uses the original lambda tree here, while interpreted evaluation deliberately uses functionsForEval to replace separately instantiated NamedLambdaVariables with the bound argument instance by exprId. As a result, a valid resolved lambda containing a CodegenFallback expression can succeed interpreted but fail under codegen: I reproduced ArrayTransform(array(1, 2, 3), LambdaFunction(CodegenFallbackExpr(detachedArg + 1), Seq(arg))), where detachedArg has the same exprId as arg but a different AtomicReference; checkEvaluation passes interpreted evaluation and then throws NullPointerException in GeneratedClass$SpecificMutableProjection.apply. Please generate the rebound function tree (and keep fallback atomic state synchronized) so codegen preserves the existing lambda-binding semantics.

[ 🤖 posted by Codex on behalf of sunchao using the code-review-for-me skill 🤖 ]

Fixed in fe004a5435e.

The generated HOF paths now mirror interpreted evaluation by generating from the rebound lambda body (functionForEval / functionsForEval) instead of the original lambda tree. This preserves the existing exprId-based lambda variable rebinding semantics while keeping the change scoped to HOF codegen.

I added a Catalyst regression that constructs a resolved ArrayTransform lambda where a CodegenFallbackExpr references a detached NamedLambdaVariable with the same exprId as the bound argument. The test failed before the fix with the generated-path NPE described in this comment, and passes after the fix.

Verified with:

build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite -- -z "ArrayTransform codegen rebinds fallback lambda variables by expression ID"' build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.HigherOrderFunctionsSuite -- -z "ArrayTransform"'

Results: focused regression passed (1 test), ArrayTransform slice passed (3 tests).

sunchao

Summary

Adds code generation support for the array higher-order functions transform, filter, exists, forall, and aggregate.

Prior state and problem.

These functions currently create an interpreted island inside otherwise generated SQL execution. Even when the surrounding projection or predicate is part of whole-stage code generation, Spark still evaluates the lambda body through Expression.eval for every array element.

That cost compounds quickly: the query pays it once per row and again once per element, and complex lambda bodies amplify it further. In CPU profiles, the overhead appears under ArrayExists.eval / ArrayExists.nullSafeEval and the lambda's child expressions, such as CaseWhen, string manipulation, and nested boolean predicates.

Generating these functions is therefore a worthwhile general optimization, not a workload-specific rewrite. The challenge is correctness: the interpreted implementation already defines subtle behavior for NULL elements, nullable predicate results, short-circuiting, fallback expressions, and mutable aggregate state. The generated path must preserve all of it.

Design approach.

The PR moves both the array traversal and ordinary lambda evaluation into the generated Java code produced for the surrounding expression. Instead of entering Expression.eval once per array element, generated execution now loads an element, binds it as the lambda argument, evaluates the lambda body, and consumes the result in one generated loop.

Conceptually, this turns:

exists(array, x -> predicate(x)) into a loop that evaluates predicate(x) and exits on the first true.
transform(array, x -> f(x)) into a loop that produces one output value per input element.
aggregate(array, zero, (acc, x) -> merge(acc, x), finish) into generated mutable accumulator state followed by one finish evaluation.

The important part is that this is not restricted to lambdas whose entire expression tree supports code generation. If a lambda contains a CodegenFallback descendant, the generated loop still updates the existing lambda backing reference before that descendant executes. Normal expressions read generated locals directly; fallback expressions see the same current element or accumulator value through the interpreted interface.

This preserves the existing execution model while removing interpreted evaluation from the common path.

Key design decisions made by this PR.

The first key decision is to generate the rebound lambda tree rather than the original lambda expression. Higher-order functions already support logically identical NamedLambdaVariable instances that share an exprId without sharing an object reference. Interpreted execution handles that through functionForEval / functionsForEval. Generated execution must do the same; otherwise a fallback expression can read a detached variable and produce a generated-only failure. The current revision correctly uses the rebound expression tree.

The second decision is to treat NULL behavior as part of the generated contract, rather than assuming the generated value is meaningful whenever a loop iteration runs. For filter, a NULL predicate means that the input element is not retained. For complex array elements, a NULL position must remain NULL even if the physical array container can expose an underlying payload at that ordinal. For exists and forall, generated execution preserves both short-circuit behavior and three-valued boolean semantics.

The third decision concerns aggregate, whose generated implementation carries mutable accumulator state across iterations and potentially across reused projection invocations. The zero value is evaluated only after confirming that the input array is non-NULL. Every accumulator assignment updates both its value and null state. The finish lambda receives that final pair, including for an empty array where the accumulator remains the initial zero value. This is necessary to prevent stale generated state from changing results.

Finally, filter intentionally uses two passes. It cannot allocate a compact output array until it knows how many elements pass the predicate, but evaluating an arbitrary predicate twice would be incorrect and potentially expensive. Recording keep decisions during the first pass, then copying selected elements during the second pass, preserves single evaluation while producing the correctly sized output.

Implementation sketch.

Each generated higher-order function follows the same broad flow.

First, it evaluates the input array once and retains the existing NULL short-circuit behavior. A NULL input array returns NULL immediately. In the aggregate case, this also ensures that neither the zero expression nor the finish expression is evaluated for that row.

Second, for a non-NULL array, it enters a generated element loop. Each iteration loads the current element and updates the generated lambda-variable state. When fallback evaluation remains anywhere inside the lambda body, the same binding step updates the fallback-visible backing reference, so generated and interpreted descendants observe one consistent current value.

Third, each function applies its own result rule:

transform writes one nullable result value into a new array for each input element.
filter records predicate decisions once and then copies only retained input elements into a compact result array.
exists stops when it finds true, while remembering NULL results where three-valued logic requires a NULL final answer.
forall stops when it finds false, with equivalent NULL tracking.
aggregate initializes accumulator state from zero, replaces it with each merge result, and evaluates finish against the final state.

The implementation keeps shared mechanics, such as lambda binding and fallback-visible state synchronization, in common helpers while leaving operation-specific control flow in each HOF implementation.

Suggested improvements.

The correctness gaps found in earlier revisions are addressed in the current patch:

generated filter now treats NULL predicate results as false;
NULL complex array elements are preserved correctly;
aggregate no longer evaluates zero for a NULL input array;
nullable aggregate accumulator state is propagated correctly through merge and finish evaluation, including empty arrays and reused generated projections;
generated evaluation now uses the rebound lambda expression tree, preserving fallback lambda-variable semantics.

The added tests exercise the corresponding semantic boundaries, including the generated-only failure case where a CodegenFallback expression refers to a separately instantiated lambda variable with the same exprId.

I re-checked the generated paths for transform, filter, exists, forall, and aggregate after these fixes and did not find another actionable correctness issue.

Approving.

sunchao · 2026-05-27T16:22:14Z

@Kimahriman can you also fix the CI failures? cc @cloud-fan @peter-toth @viirya also for reviews

Kimahriman · 2026-05-27T17:03:47Z

Rebased to master to re-trigger CI

github-actions Bot added the SQL label Nov 11, 2021

Kimahriman mentioned this pull request Nov 11, 2021

[SPARK-37019][SQL] Add codegen support to array transform #34294

Closed

Kimahriman force-pushed the array-hof-codegen branch from f46fb71 to 217960e Compare December 3, 2021 23:17

Kimahriman force-pushed the array-hof-codegen branch from 217960e to ce082f3 Compare December 23, 2021 14:35

Kimahriman force-pushed the array-hof-codegen branch from ce082f3 to d4a2f63 Compare January 2, 2022 14:45

Kimahriman force-pushed the array-hof-codegen branch from d4a2f63 to aaa4be4 Compare January 28, 2022 20:56

Kimahriman force-pushed the array-hof-codegen branch from aaa4be4 to 3da6342 Compare April 25, 2022 00:08

Kimahriman force-pushed the array-hof-codegen branch from 3da6342 to c3236c0 Compare May 7, 2022 14:42

Kimahriman force-pushed the array-hof-codegen branch from c3236c0 to 2e9f4d3 Compare June 5, 2022 17:48

Kimahriman force-pushed the array-hof-codegen branch 2 times, most recently from 9cec788 to 8b898b0 Compare July 12, 2022 11:44

Kimahriman force-pushed the array-hof-codegen branch from 8b898b0 to 194e457 Compare August 28, 2022 22:17

Kimahriman force-pushed the array-hof-codegen branch from 194e457 to 1a52017 Compare September 23, 2022 23:56

Kimahriman force-pushed the array-hof-codegen branch from 1a52017 to b71b633 Compare October 18, 2022 12:01

Kimahriman force-pushed the array-hof-codegen branch from b71b633 to a565a82 Compare November 6, 2022 13:32

Kimahriman force-pushed the array-hof-codegen branch from a565a82 to 92d9a9f Compare November 22, 2022 00:13

Kimahriman force-pushed the array-hof-codegen branch 2 times, most recently from a565a82 to 92d9a9f Compare November 30, 2022 23:36

Kimahriman force-pushed the array-hof-codegen branch from 92d9a9f to 03c2dc6 Compare January 1, 2023 14:36

Kimahriman force-pushed the array-hof-codegen branch from 03c2dc6 to 572b666 Compare February 4, 2023 13:36

Kimahriman force-pushed the array-hof-codegen branch from 572b666 to 4a7dba9 Compare March 3, 2023 12:54

Kimahriman mentioned this pull request Mar 18, 2023

[SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression() #40473

Closed

Kimahriman force-pushed the array-hof-codegen branch from 4a7dba9 to 5b13cd2 Compare March 30, 2023 11:46

jaceklaskowski reviewed Apr 3, 2023

View reviewed changes

Kimahriman force-pushed the array-hof-codegen branch from dcfb17c to a565a82 Compare April 24, 2023 17:04

Kimahriman force-pushed the array-hof-codegen branch from 705d950 to f2e6135 Compare May 5, 2025 14:31

Kimahriman force-pushed the array-hof-codegen branch from f2e6135 to 2a651b2 Compare June 24, 2025 14:19

Kimahriman mentioned this pull request Jun 24, 2025

[SPARK-37466][SQL] Support subexpression elimination in higher order functions #51272

Closed

Kimahriman force-pushed the array-hof-codegen branch from 2a651b2 to a1e2c24 Compare August 13, 2025 21:33

Kimahriman force-pushed the array-hof-codegen branch from a1e2c24 to 35cd592 Compare November 4, 2025 21:43

Kimahriman force-pushed the array-hof-codegen branch 2 times, most recently from 68a0a29 to 8cfefef Compare November 28, 2025 18:56

Kimahriman force-pushed the array-hof-codegen branch from 8cfefef to de114f6 Compare February 3, 2026 15:55

Kimahriman force-pushed the array-hof-codegen branch from de114f6 to 2b13d01 Compare March 11, 2026 15:15

Kimahriman mentioned this pull request Mar 17, 2026

[SPARK-56033][SQL] Support whole-stage codegen for ArrayTransform #54864

Draft

Kimahriman force-pushed the array-hof-codegen branch from 2b13d01 to 2cab349 Compare May 11, 2026 21:05

sunchao reviewed May 22, 2026

View reviewed changes

sunchao reviewed May 23, 2026

View reviewed changes

sunchao reviewed May 24, 2026

View reviewed changes

sunchao reviewed May 25, 2026

View reviewed changes

sunchao approved these changes May 27, 2026

View reviewed changes

Kimahriman added 5 commits May 27, 2026 17:03

Add codegen support to array-based higher order functions

a9e3383

Fix ArrayAggregate finish accumulator nullability

042a585

[SPARK-34558][SQL] Address higher order function codegen review comments

ffaa876

[SPARK-34558][SQL] Clear aggregate accumulator null state

2578cf7

[SPARK-34558][SQL] Rebind lambda variables for HOF codegen

c47fa7f

Kimahriman force-pushed the array-hof-codegen branch from fe004a5 to c47fa7f Compare May 27, 2026 17:03

Conversation

Kimahriman commented Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Nov 11, 2021

Uh oh!

Kimahriman commented Jan 2, 2022

Uh oh!

Tagar commented Mar 15, 2022

Uh oh!

Kimahriman commented Mar 15, 2022

Uh oh!

jaceklaskowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kimahriman commented Apr 4, 2023

Uh oh!

chris-twiner commented Mar 13, 2025

Uh oh!

Kimahriman commented Mar 19, 2025

Uh oh!

Kimahriman commented Jun 24, 2025

Uh oh!

LuciferYang commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunchao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Summary

Uh oh!

sunchao commented May 27, 2026

Uh oh!

Kimahriman commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Kimahriman commented Nov 11, 2021 •

edited

Loading

sunchao left a comment •

edited

Loading