Skip to content

[pull] main from apache:main#123

Merged
pull[bot] merged 5 commits intojingyi-zhao-01:mainfrom
apache:main
Sep 15, 2025
Merged

[pull] main from apache:main#123
pull[bot] merged 5 commits intojingyi-zhao-01:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Sep 15, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

adriangb and others added 5 commits September 15, 2025 14:57
* fix: prevent UnionExec panic with empty inputs

This commit fixes a panic in UnionExec when constructed with empty inputs.
Previously, UnionExec::new(vec![]) would cause an index out of bounds panic
at union.rs:542 when trying to access inputs[0].

Changes:
- Made UnionExec::new() return Result<Self> with proper validation
- Made union_schema() return Result<SchemaRef> with empty input checks
- Added descriptive error messages for empty input cases
- Updated all call sites to handle the new Result return type
- Added comprehensive tests for edge cases

Error messages:
- "UnionExec requires at least one input"
- "Cannot create union schema from empty inputs"

The fix maintains backward compatibility for valid inputs while preventing
crashes and providing clear error messages for invalid usage.

Fixes #17052

* refactor: address PR review comments for UnionExec empty inputs fix

- Add new try_new method that returns Result<Arc<dyn ExecutionPlan>>
- Deprecate existing new method in favor of try_new
- Optimize single-input case: try_new returns the input directly
- Remove redundant assert!(result.is_err()) from tests
- Rename test_union_multiple_inputs_still_works to test_union_schema_multiple_inputs
- Update all call sites to use appropriate API (try_new for new code, deprecated new for tests)

This maintains backward compatibility while providing better error handling
and optimization for single-input cases.

* Fix cargo fmt and clippy warnings

- Add proper feature gates for parquet_encryption in datasource-parquet
- Format code to pass cargo fmt checks
- All tests passing

* Fix clippy

---------

Co-authored-by: Eeshan <eeshan@Eeshans-MacBook-Pro.local>
Co-authored-by: ebembi-crdb <ebembi@cockroachlabs.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Use taiki-e/install-action and binstall in CI

* Use action hash
* Trying cargo machete to prune unused deps.

* Using encode_arrow_schema from arrow-rs. (#17543)
@pull pull Bot locked and limited conversation to collaborators Sep 15, 2025
@pull pull Bot added the ⤵️ pull label Sep 15, 2025
@pull pull Bot merged commit 685f5d6 into jingyi-zhao-01:main Sep 15, 2025
pull Bot pushed a commit that referenced this pull request Oct 16, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.
Related to apache#18084

## Rationale for this change

Run extended suite on PRs for critical areas, to avoid post merge
bugfixing

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Oct 16, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

Followup on
apache#18063 (review)

## Rationale for this change

Use cheaper `NullBuffer::union` to apply null mask instead of iterator
approach

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 17, 2025
…unctions in proto (apache#18024)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17417.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

- Support `null_treatment`, `distinct`, and `filter` for window function
in proto.
- Support `null_treatment` for aggregate udf in proto.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- [x] Add `null_treatment`, `distinct`, `filter` fields to
`WindowExprNode` message and handle them in `to/from_proto.rs`.
- [x] Add `null_treatment` field to `AggregateUDFExprNode` message and
handle them in `to/from_proto.rs`.
- [ ] Docs update: I'm not sure where to add docs as declared in the
issue description.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

- Add tests to `roundtrip_window` for respectnulls, ignorenulls,
distinct, filter.
- Add tests to `roundtrip_aggregate_udf` for respectnulls, ignorenulls.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
N/A

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
pull Bot pushed a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Doesn't close an issue.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Hi we are hiop, a Serverless Data Logistic Platform.
We use DataFusion as a core part of our backend engine, and it plays a
crucial role in our data infrastructure. Our team members are passionate
about the project and actively try contribute to its development
(@dariocurr).

We’d love to have Hiop listed among the Known Users to show our support
and help the DataFusion community continue to grow.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Just adding hiop as known user

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 17, 2025
…8117)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#3695
- Closes apache#3797

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Was looking at above issues and I don't believe we skip the failed rules
for any tests anymore (default for the config is also `false`), apart
from this cleanup, so filing this PR so we can close the issues. Seems
we only do in this `window.slt` test after this fix:


https://github.com/apache/datafusion/blob/621a24978a7a9c6d2b27973d1853dbc8776a56b5/datafusion/sqllogictest/test_files/window.slt#L2587-L2611

Which seems intentional.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Remove unnecessary `skip_failed_rules` config.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
`EXPLAIN ANALYZE` can be used for profiling and displays the results
alongside the EXPLAIN plan. The issue is that it currently shows too
many low-level details. It would provide a better user experience if
only the most commonly used metrics were shown by default, with more
detailed metrics available through specific configuration options.

### Example
In `datafusion-cli`:
```
> CREATE EXTERNAL TABLE IF NOT EXISTS lineitem
STORED AS parquet
LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem';
0 row(s) fetched.
Elapsed 0.000 seconds.

explain analyze select *
from lineitem
where l_orderkey = 3000000;
```
The parquet reader includes a large number of low-level details:
```
metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms]
```

I believe only a subset of it is commonly used, for example
`output_rows`, `metadata_load_time`, and how many file/row-group/pages
are pruned, and it would better to only display the most common ones by
default.

### Existing `VERBOSE` keyword
There is a existing verbose keyword in `EXPLAIN ANALYZE VERBOSE`,
however it's turning on per-partition metrics instead of controlling
detail level. I think it would be hard to mix this partition control and
the detail level introduced in this PR, so they're separated: the
following config will be used for detail level and the semantics of
`EXPLAIN ANALYZE VERBOSE` keep unchanged.

### This PR: configurable explain analyze level
1. Introduced a new config option `datafusion.explain.analyze_level`.
When set to `dev` (default value), all existing metrics will be shown.
If set to `summary`, only `BaselineMetrics` will be displayed (i.e.
`output_rows` and `elapsed_compute`).
Note now we only include `BaselineMetrics` for simplicity, in the
follow-up PRs we can figure out what's the commonly used metrics for
each operator, and add them to `summary` analyze level, finally set the
`summary` analyze level to default.
2. Add a `MetricType` field associated with `Metric` for detail level or
potentially category in the future. For different configurations, a
certain `MetricType` set will be shown accordingly.

#### Demo
```
-- continuing the above example
> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.

> explain analyze select *
from lineitem
where l_orderkey = 3000000;
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=25.339µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                   |   FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=81.221µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|                   |     DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns] |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.025 seconds.
```
Only `BaselineMetrics` are shown.


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
4. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
UT

## Are there any user-facing changes?

No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Oct 17, 2025
…e#18091)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

There's a few functions in
`datafusion/expr-common/src/type_coercion/aggregates.rs` that are unused
elsewhere in the codebase, likely a remnant before the refactor to UDF,
so removing them. Some are still used (`coerce_avg_type()` and
`avg_return_type()`) so these are inlined into the Avg aggregate
function (similar to Sum). Also refactor some window functions to use
already available macros.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Remove some unused functions
- Inline avg coerce & return type logic
- Refactor Spark Avg a bit to remove unnecessary code
- Refactor ntile & nth window functions to use available macros

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

Yes as these functions were publicly exported; however I'm not sure they
were meant to be used by users anyway, given what they do.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 17, 2025
apache#18099)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Fixes comparison errors when using dictionary-encoded types with
comparison functions like NULLIF.

## Rationale for this change

When using dictionary-encoded columns (e.g., Dictionary(Int32, Utf8)) in
comparison operations with literals or other types, DataFusion would
throw an error stating the types are not comparable. This was
particularly problematic for functions like NULLIF which rely on
comparison coercion.

The issue was that comparison_coercion_numeric didn't handle dictionary
types, even though the general comparison_coercion function did have
dictionary support.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

1. Refactored dictionary comparison logic: Extracted common dictionary
coercion logic into dictionary_comparison_coercion_generic to avoid code
duplication.
2. Added numeric-specific dictionary coercion: Introduced
dictionary_comparison_coercion_numeric that uses numeric-preferring
comparison rules when dealing with dictionary value types.
3. Updated comparison_coercion_numeric: Added a call to
dictionary_comparison_coercion_numeric in the coercion chain to properly
handle dictionary types.
4. Added sqllogictest cases demonstrating the fix works for various
dictionary comparison scenarios.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes, added tests in datafusion/sqllogictest/test_files/nullif.slt
covering:
  - Dictionary type compared with string literal
  - String compared with dictionary type
  - Dictionary compared with dictionary

All tests pass with the fix and would fail without it.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

This is a bug fix that enables previously failing queries to work
correctly. No breaking changes or API modifications.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 18, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#16678.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

The issue has been fixed in apache#16639, this PR just adds a testcase for it.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Add a test case for `to_timestamp(double)` with vectorized input.
Similar to the one presented in the issue.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No
pull Bot pushed a commit that referenced this pull request Oct 20, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18070

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
See the above issue and its comment
apache#18070 (comment)

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
In nested loop join, when the join column includes `List(Utf8View)`, use
`take()` instead of `to_array_of_size()` to avoid deep copying the utf8
buffers inside `Utf8View` array.

This is the quick fix, avoiding deep copy inside `to_array_of_size()` is
a bit tricky.
Here is `ListArray`'s physical layout:
https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.html
If multiple elements is pointing to the same list range, the underlying
payload can't be reused.So the potential fix in `to_array_of_size` can
only avoids copying the inner-inner utf8view array buffers, but can't
avoid copying the inner array (i.e. views are still copied), and deep
copying for other primitive types also can't be avoided. Seems this can
be better solved when `ListView` type is ready 🤔

### Benchmark
I tried query 1 in apache#18070,
but only used 3 randomly sampled `places` parquet file.

49.0.0: 4s
50.0.0: stuck > 1 minute
PR: 4s

Now the performance are similar, I suspect the most time is spend
evaluating the expensive `array_has` so the optimization in
apache#16996 can't help much.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Existing tests
## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Oct 20, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17913 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
- Improve SQL code block rendering by upgrading `pydata-sphinx-theme`
- fix sidebar layout 

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
4. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
yes

## Are there any user-facing changes?

documentation ui
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 21, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- part of apache#17427 

## Rationale for this change
Adds regular joins (left, right, full, inner) for PWMJ as they behave
differently in the code path.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
Adds classic join + physical planner

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes SLT tests  + unit tests

## Follow up work to this pull request
- Handling partitioned queries and multiple record batches (fuzz testing
will be handled with this)
- Simplify physical planning
- Add more unit tests for different types (another pr as the LOC in this
pr is getting a little daunting)

next would be to implement the existence joins

---------

Co-authored-by: Yongting You <2010youy01@gmail.com>
pull Bot pushed a commit that referenced this pull request Oct 21, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17854 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 22, 2025
…pache#18017)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17993

## Rationale for this change

```
DataFusion CLI v50.1.0
> SET TIME ZONE = '+08:00';
0 row(s) fetched. 
Elapsed 0.011 seconds.

> SELECT arrow_typeof(now());
+---------------------------------------+
| arrow_typeof(now())                   |
+---------------------------------------+
| Timestamp(Nanosecond, Some("+08:00")) |
+---------------------------------------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> SELECT count(1) result FROM (SELECT now() as n) a WHERE n > '2000-01-01'::date;
+--------+
| result |
+--------+
| 1      |
+--------+
1 row(s) fetched. 
Elapsed 0.029 seconds.
```

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
When the timezone changes, re-register `now()` function

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Oct 22, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

According to three-valued logic we should return `null` and that's also
what happens when the argument is not a constant as seen in the test.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

Updated `ArrayHas::simplify` to explicitly handle `null`

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Updated the `array_has` SQL test and added unit tests

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

Yes, a minor change in behaviour wrt `null`

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 23, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#16820 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 23, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#16602

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Now we have to search in the code comment (or even implementation) to
find the documentation of certain metrics, it would be better to open a
page in the `user-guide` for metrics.

The doc has to be manually updated, the metrics construction is
scattered in the codebase, so it's hard to make it auto-generated.

This PR only includes 2 common metrics, I plan to add more
operator-specific metrics while working on
apache#18116

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 23, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#15791 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Related apache#16324 apache#16617 

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
No

---------

Co-authored-by: Ian Lai <Ian.Lai@senao.com>
Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: blaginin <dmitrii@blaginin.me>
pull Bot pushed a commit that referenced this pull request Oct 24, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#12220

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Updating documentation on adding new functions; aggregate instructions
were old, and adding in other types too (window, table)

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Updated instructions for adding new functions to DataFusion. Also did
some other touchups on the docs.

## Are these changes tested?

Doc changes only.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

Doc changes only.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
pull Bot pushed a commit that referenced this pull request Oct 24, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Issue was raised as comment:
apache#17913 (comment)

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

On current website can see:

<img width="752" height="458" alt="image"
src="https://github.com/user-attachments/assets/7df06070-528f-432d-8a11-cf92f86f5f34"
/>

-
https://datafusion.apache.org/library-user-guide/upgrading.html#projectionexpr-changed-from-type-alias-to-struct

Source document:


https://github.com/apache/datafusion/blob/408e1e4e2c46d673a067e05f3e363a6f51e641c4/docs/source/library-user-guide/upgrading.md?plain=1#L204-L218

<img width="771" height="711" alt="image"
src="https://github.com/user-attachments/assets/9472b6b6-1a7e-42f3-a9ae-17d55adfb491"
/>


-https://datafusion.apache.org/library-user-guide/extending-operators.html#optimizing-logical-plan

Source document:


https://github.com/apache/datafusion/blob/408e1e4e2c46d673a067e05f3e363a6f51e641c4/docs/source/library-user-guide/extending-operators.md?plain=1#L32-L57

`,ignore` should also be trimmed out.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Fix regex used in trimming rust blocks to account for optional `ignore`.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Tested locally:

<img width="791" height="441" alt="image"
src="https://github.com/user-attachments/assets/ff679228-812a-47d5-876e-6fc8fb552bed"
/>

<img width="762" height="681" alt="image"
src="https://github.com/user-attachments/assets/7c551441-3579-4f65-bf41-325ac81fcc76"
/>

Existing blocks (with only `rust` and not `rust,ignore`) unaffected,
e.g. in catalogs page:

<img width="764" height="543" alt="image"
src="https://github.com/user-attachments/assets/c18376d5-974c-4510-89e4-4c777c860ca0"
/>


## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No, doc changes.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 25, 2025
…rquet source (apache#18196)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Part of apache#18116

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
The below configuration can be used to let `EXPLAIN ANALYZE` only show
important high-level insights.
```
set datafusion.explain.analyze_level = summary;
```

This PR sets `summary` level metrics for the parquet data source:
### `summary` level metrics for `DataSourceExec` with `Parquet` source
- File level pruning metrics
- Row-group level pruning metrics
- Bytes scanned
- metadata load time
In
https://github.com/apache/datafusion/blob/155b56e521d75186776a65f1634ee03058899a79/datafusion/datasource-parquet/src/metrics.rs#L29

The remaining metrics are kept in the `dev` level. I'm not sure if the
page level pruning metrics should also be included to the `summary`
level, I'm open to suggestions for this, or any other metrics that
should also be included.

While implementing this, I came up with a few ideas to further improve
metrics tracking in the Parquet scanner. I’ve documented them in
apache#18195

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Set the above metrics to `summary` analyze level

## Are these changes tested?
UTs
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 26, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Part of apache#18095

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

In the contributor guide documentation, better explain why large
AI-generated PRs without understanding should be rejected. See more in
the original issue discussion.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Oct 30, 2025
…ields (apache#18100)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#16997
- Part of apache#11725
- Supersedes apache#17085

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When reviewing apache#17085 I was very confused by the fix suggested, and
tried to understand why `AccumulatorArgs` didn't have easy access to
`Field`s of its input expressions, as compared to scalar/window
functions which do. Introducing this new field should make it easier for
users to grab datatype, metadata, nullability of their input expressions
for aggregate functions.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Add a slice of `FieldRef` to `AccumulatorArgs` so users don't need to
compute the input expression fields themselves via using schema. This
addresses apache#16997 as it was confusing to have only the schema available
as there are valid (?) cases where the schema is empty (such as literal
only input).

This fix differs from apache#17085 in that it doesn't special case for when
there is literal only input; it leaves the physical `schema` provided to
`AccumulatorArgs` untouched but provides a more ergonomic (and less
confusing) API for users to retrieve `Field`s of their input arguments.

- I'm still not sure if the schema being empty for literal only inputs
is correct or not, so this might be considered a side step. If we could
remove `schema` entirely from `AccumulatorArgs` maybe we wouldn't need
to worry about this, but see my comment for why that wasn't done in this
PR

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing unit tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

Yes, new field to `AccumulatorArgs` which is publicly exposed (with all
it's fields).

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 30, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18334.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

The array-based implementation of date_trunc can produce incorrect
results for negative timestamps (i.e. dates before 1970-01-01). Check
for any such incorrect values and compensate accordingly.

Running the date_trunc benchmark suggests this fix introduces an ~9%
performance cost.

```
date_trunc_minute_1000  time:   [1.7424 µs 1.7495 µs 1.7583 µs]
                        change: [+7.9289% +8.5950% +9.1955%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
```

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, an SLT is added based on the issue.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 30, 2025
…e#18317)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#15881
  - See my notes below

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Trying to move away from user defined signatures where possible; mainly
to ensure consistency of error checking/messages. The original issue is
because the function has to do this checking itself leading to
inconsistency of error used (ideally shouldn't be internal). By
uplifting away from a user defined signature we can make use of existing
code meant to handle this checking and error messages for us.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Defined range/generate_series signature via coercible API instead of
being user defined. Some accompanying changes are needed in the
signature code to make this possible.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added SLT tests and fixed any existing ones.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No (error messages do change though)

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 31, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
It's better to let pruning metrics in parquet displayed in an order that
is the same as the actual pruning order:
```
metrics=...files_ranges_pruned_statistics=21 total → 3 matched, row_groups_pruned_statistics=1 total → 1 matched, row_groups_pruned_bloom_filter=1 total → 1 matched, page_index_rows_pruned=748901 total → 19813 matched...
```
Now it's ordered alphabetically.
See apache#18321 (comment)
for reproducing.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Update the sort key API in `MetricValue`, to let the parquet pruning
metrics display in the expected order.

## Are these changes tested?
UT
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Oct 31, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change
Found when was testing apache#18356

```
> select date_trunc('YY', now());
Execution error: Unsupported date_trunc granularity: yy

```

Which is confusing, I would like to get a list of supported values
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 1, 2025
…or UDAFs (apache#18397)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18280

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

`AggregateUDFImpl::is_ordered_set_aggregate` is confusingly named as all
it does currently is permit usage of `WITHIN GROUP` SQL syntax. I don't
think it would have any functionality in the future beyond this. Also
makes it easier if in future we decide to implement [hypothetical-set
aggregate
functions](https://www.postgresql.org/docs/9.4/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE)
too, since we wouldn't need a `is_hypothetical_set_aggregate` variation
either.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Rename `AggregateUDFImpl::is_ordered_set_aggregate` to
`AggregateUDFImpl::supports_within_group_clause`.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

Yes. Added section to upgrade guide.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 2, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Part of apache#18217

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
In `FilterExec`, selectivity is calculated as `output_rows/input_rows`.
This PR supports such metric. I think this metrics provides important
application-level insights, and would be commonly used, so it is
displayed in the `summary` verbose level.

### Demo in `datafusion-cli`
```
> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.

> explain analyze select * from generate_series(100) as t1(v1) where v1 <10;
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                   |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=10, elapsed_compute=1.763µs, output_bytes=64.0 KB]                                                                          |
|                   |   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=10, elapsed_compute=25.833µs, output_bytes=64.0 KB]                                                                |
|                   |     FilterExec: value@0 < 10, metrics=[output_rows=10, elapsed_compute=34.888µs, output_bytes=128.0 B, selectivity=9.9% (10/101)]                                                      |
|                   |       RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1, metrics=[]                                                                                                |
|                   |         LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=33.167µs, output_bytes=64.0 KB] |
|                   |                                                                                                                                                                                        |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
```

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
1. Add a new `MetricValue` for ratio.
2. Tracking selectivity in `FilterExec` with `MetricValue::Ratio`

## Are these changes tested?
UT
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
3. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 2, 2025
…apache#18430)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Remove a unnecessary vec allocation in SortMergeJoinStream
initialization

## Rationale for this change

Remove a unnecessary vec allocation in SortMergeJoinStream
initialization

## What changes are included in this PR?

Remove a unnecessary vec allocation in SortMergeJoinStream
initialization

## Are these changes tested?

Covered by existing

## Are there any user-facing changes?

No
pull Bot pushed a commit that referenced this pull request Nov 3, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When reviewing apache#18424 I noticed some refactoring that could be applied
to existing array reverse implementation.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

See my comments for the refactors & justifications.

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 3, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#16272.

## Rationale for this change

- Null with null type were throwing invalid casts in substrait round
trips

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Added a null variation const that allows NULL with NULL types to be
casted properly.
- Made this a UserDefined type from the substrait side of things

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
- Yes added unit test in producer/types.rs
- Previously failing tests pass

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 3, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes 
- apache#18204
- apache#18081
- fixes apache#18219 as a side effect

## Rationale for this change

Default timezone was previously zulu however with the recent change to
support default tz in now(), current_date(), etc which used to have no
default tz the choice was made to unset the system wide timezone.

## What changes are included in this PR?

Code, tests, upgrading doc.

## Are these changes tested?

Yes, with existing tests.

## Are there any user-facing changes?

Yes. Any query that used to use the default timezone would return a
timestamp with a timezone of 'Z' will now return a timestamp without a
timezone. This can be changed back to the previous behaviour with the
sql

```sql
SET TIMEZONE = '+00:00';
```

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Nov 3, 2025
…8363)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17670 
- Closes apache#18419

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

Added a `flatten()` `List(LargeList)` test to the `sqllogictest`

Added support for array `flatten()` on `List(LargeList(_))` types
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

`sqllogictest` passes, but I still need to implement a test where
offsets could not be downcasted from i64 to i32
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

Users will be able to use `flatten` on `List(LargeList)` types
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 3, 2025
…ALYZE (apache#18455)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18410

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

This PR adds the `reduction_factor` metric to the `AggregateExec`
mode=Partial case.

e.g from the issue

```
create table t1(a int, b int);
insert into t1 values (1,10), (1, 20), (2,10), (2,30);
explain analyze select a, sum(b) from t1 group by a;

+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                         |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[sum(t1.b)], metrics=[output_rows=2, elapsed_compute=7.856539ms, output_bytes=544.0 B]                            |
|                   |   CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=192.334µs, output_bytes=96.0 KB]                                                      |
|                   |     RepartitionExec: partitioning=Hash([a@0], 10), input_partitions=10, metrics=[]                                                                                           |
|                   |       RepartitionExec: partitioning=RoundRobinBatch(10), input_partitions=1, metrics=[]                                                                                      |
|                   |         AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[sum(t1.b)], metrics=[output_rows=2, elapsed_compute=2.581625ms, output_bytes=544.0 B, reduction_factor=50% (2/4)] |
|                   |           DataSourceExec: partitions=1, partition_sizes=[1], metrics=[]                                                                                                      |
|                   |                                                                                                                                                                              |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

Note: For AggregateExec cases where this doesn't apply, the
reduction_factor metric won't be shown. Here's an example of the explain
analyze from the modified test in `explain_analyze.rs`.

```
running query: EXPLAIN ANALYZE SELECT count(*) as cnt FROM (SELECT count(*), c1 FROM aggregate_test_100 WHERE c13 != 'C2GT5KVyOPZpgKVl110TyZO0NcJ434' GROUP BY c1 ORDER BY c1 ) AS a UNION ALL SELECT 1 as cnt UNION ALL SELECT lead(c1, 1) OVER () as cnt FROM (select 1 as c1) AS b LIMIT 3
Query Output:

+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalescePartitionsExec: fetch=3, metrics=[output_rows=3, elapsed_compute=6.084µs, output_bytes=25.0 B]                                                                                                                                                                                                                                                                                                                                                                                                    |
|                   |   UnionExec, metrics=[output_rows=3, elapsed_compute=117.208µs, output_bytes=25.0 B]                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                   |     ProjectionExec: expr=[count(Int64(1))@0 as cnt], metrics=[output_rows=1, elapsed_compute=1.333µs, output_bytes=8.0 B]                                                                                                                                                                                                                                                                                                                                                                                 |
|                   |       AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))], metrics=[output_rows=1, elapsed_compute=70.542µs, output_bytes=8.0 B]                                                                                                                                                                                                                                                                                                                                                                    |
|                   |         CoalescePartitionsExec, metrics=[output_rows=3, elapsed_compute=4.958µs, output_bytes=24.0 B]                                                                                                                                                                                                                                                                                                                                                                                                     |
|                   |           AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))], metrics=[output_rows=3, elapsed_compute=51.835µs, output_bytes=24.0 B]                                                                                                                                                                                                                                                                                                                                                             |
|                   |             ProjectionExec: expr=[], metrics=[output_rows=5, elapsed_compute=2.251µs, output_bytes=0.0 B]                                                                                                                                                                                                                                                                                                                                                                                                 |
|                   |               AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[], metrics=[output_rows=5, elapsed_compute=76.666µs, output_bytes=48.0 KB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, peak_mem_used=50544, aggregate_arguments_time=3ns, aggregation_time=3ns, emitting_time=5.875µs, time_calculating_group_ids=9.459µs]                                                                                                                                                            |
|                   |                 CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=5, elapsed_compute=11.249µs, output_bytes=192.0 KB]                                                                                                                                                                                                                                                                                                                                                                     |
|                   |                   RepartitionExec: partitioning=Hash([c1@0], 3), input_partitions=3, metrics=[spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, fetch_time=15.064041ms, repartition_time=149.418µs, send_time=8.672µs]                                                                                                                                                                                                                                                                                  |
|                   |                     AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[], metrics=[output_rows=5, elapsed_compute=248.667µs, output_bytes=16.0 KB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, skipped_aggregation_rows=0, peak_mem_used=52168, aggregate_arguments_time=3ns, aggregation_time=3ns, emitting_time=7.377µs, time_calculating_group_ids=128.46µs, reduction_factor=5.1% (5/99)]                                                                                                   |
|                   |                       CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=99, elapsed_compute=81.459µs, output_bytes=64.0 KB]                                                                                                                                                                                                                                                                                                                                                               |
|                   |                         FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434, projection=[c1@0], metrics=[output_rows=99, elapsed_compute=503.793µs, output_bytes=1584.0 B, selectivity=99% (99/100)]                                                                                                                                                                                                                                                                                                      |
|                   |                           RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1, metrics=[spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, fetch_time=4.160958ms, repartition_time=1ns, send_time=16.085µs]                                                                                                                                                                                                                                                                             |
|                   |                             DataSourceExec: file_groups={1 group: [[Users/peter/Documents/open-source/datafusion/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c13], file_type=csv, has_header=true, metrics=[output_rows=100, elapsed_compute=1ns, output_bytes=19.1 KB, batches_split=0, file_open_errors=0, file_scan_errors=0, time_elapsed_opening=313.458µs, time_elapsed_processing=3.974624ms, time_elapsed_scanning_total=3.771208ms, time_elapsed_scanning_until_data=3.714625ms] |
|                   |     ProjectionExec: expr=[1 as cnt], metrics=[output_rows=1, elapsed_compute=20.792µs, output_bytes=8.0 B]                                                                                                                                                                                                                                                                                                                                                                                                |
|                   |       PlaceholderRowExec, metrics=[]                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                   |     ProjectionExec: expr=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@1 as cnt], metrics=[output_rows=1, elapsed_compute=1.333µs, output_bytes=9.0 B]                                                                                                                                                                                                                                                                                                                    |
|                   |       BoundedWindowAggExec: wdw=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": nullable Int64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted], metrics=[output_rows=1, elapsed_compute=560µs, output_bytes=17.0 B]                                                                                                                             |
|                   |         ProjectionExec: expr=[1 as c1], metrics=[output_rows=1, elapsed_compute=2.459µs, output_bytes=8.0 B]                                                                                                                                                                                                                                                                                                                                                                                              |
|                   |           PlaceholderRowExec, metrics=[]                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

The following cases don't include `reduction_factor` metric
- `AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]`
- `AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]`
- `AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[]`

While this case does:
- `AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[]` ->
`reduction_factor=5.1% (5/99)`

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
Yes, a new metric will be visible when running `EXPLAIN ANALYZE`

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Yongting You <2010youy01@gmail.com>
pull Bot pushed a commit that referenced this pull request Nov 4, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18418

## Rationale for this change

Keep dependencies up to date.

## What changes are included in this PR?

Benchmark updates.

## Are these changes tested?

Yes, every single benchmark was run.

## Are there any user-facing changes?

No.
pull Bot pushed a commit that referenced this pull request Nov 4, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

There's no issue for this, just some simple text fixes.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

While "the the" *can* be grammatically correct, in these instances it
was not.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

"the the" -> "the"

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Not at as such, no.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Yes, in both code docs and in the website.
pull Bot pushed a commit that referenced this pull request Nov 4, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18479.

You can see an example run with the warning
[here](https://github.com/apache/datafusion/actions/runs/19056329292/job/54427495514),
just expand the `Check datafusion (no-default-features)` section.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Fixes a warning when building without the sql feature.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Makes a `use` declaration conditional based on the sql feature.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

I ran the same command and had no issues.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Nope!
pull Bot pushed a commit that referenced this pull request Nov 4, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- part of #apache#18142.

## Rationale for this change
As discussed in apache#18289 this PR
is for consolidating all the `flight` examples into a single example
binary. Then we can make sure we are agreed on the pattern and then we
can apply it to the remaining examples

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
pull Bot pushed a commit that referenced this pull request Nov 5, 2025
…st (apache#18485)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- This PR came up as part of apache#17964.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This PR is intended to fix return type mismatch of spark `array` when
inner data type is `LargeList`, e.g.
```
query error
SELECT array(arrow_cast(array(1), 'LargeList(Int64)'))
----
DataFusion error: Internal error: Function 'array' returned value of type 'LargeList(Field { name: "element", data_type: LargeList(Field { data_type: Int64, nullable: true }), nullable: true })' while the following type was promised at planning time and expected: 'List(Field { name: "element", data_type: LargeList(Field { data_type: Int64, nullable: true }), nullable: true })'.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues
```

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Return `List` regardless of whether inner data type is `LargeList` or
not. This aligns with the behavior of datafusion `make_array` function.
- Remove `return_field_from_args` as `return_type` is already defined
and is invoked internally.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No.

---------

Co-authored-by: Michael Kleen <mkleen@gmail.com>
Co-authored-by: Sergey Zhukov <62326549+cj-zhukov@users.noreply.github.com>
Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: bubulalabu <bubulalabububu@gmail.com>
Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
pull Bot pushed a commit that referenced this pull request Nov 5, 2025
apache#18481)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18407

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
This new metric will give the user better visibility to see what portion
of the possibilities is actually being matched.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Add selectivity metric to NestedLoopJoinExec for EXPLAIN ANALYZE

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Added tests

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
Yes, new metric in explain analyze
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Yongting You <2010youy01@gmail.com>
pull Bot pushed a commit that referenced this pull request Nov 6, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- part of #apache#18142.

## Rationale for this change
This PR is for consolidating all the `udf` examples into a single
example binary. We are agreed on the pattern and we can apply it to the
remaining examples
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
pull Bot pushed a commit that referenced this pull request Nov 7, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

An initial attempt towards
apache#18467

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
### Rationale for the additional lint rule
`clippy::needless_pass_by_value`
There is a clippy lint rule that is not turned on by the current
strictness level in CI:
https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_value
Note it has the `Clippy` category `pedantic`, and its description is
`lints which are rather strict or have occasional false positives` from
https://doc.rust-lang.org/nightly/clippy

It seems we have been suffering from the excessive copying issue for
quite some time, and @alamb is on the front line now
apache#18413. I think this extra
lint rule is able to help.

### Implementation plan
This PR only enables this rule in `datafusion-common` package, and apply
`#[allow(clippy::needless_pass_by_value)]` for all violations.
If this PR makes sense, we can open a tracking issue and roll out this
check to the remaining workspace packages. At least this can help
prevent new inefficient patterns and identify existing issues that we
can fix gradually.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18505.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Enforce clippy `needless_pass_by_value`.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Follow on to apache#18468

## Rationale for this change

We missed the fact that you couldn't yet add new linter rules to
subcrates via Cargo.toml overrides. Thankfully @Jefffrey sorted is out.
Let's try and avoid that again by leaving a comment

## What changes are included in this PR?

Add comments to help our future selves remember to add new lints to
lib.rs rather than Cargo.toml for subcrates

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
In apache#18468, there is a
inconsistent comment I forget to remove.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Part of apache#14763 and apache#14760

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Current `log()` signature has some drawbacks:


https://github.com/apache/datafusion/blob/a5eb9121ccf802dda547897155403b08a4fbf774/datafusion/functions/src/math/log.rs#L78-L105

- A bit nasty to look at: mixes numeric with exact float/int with exact
decimal (of exact precision and scale)
- Can't accommodate arbitrary decimals of any precision/scale (this is
true for other functions too)

Aim of this PR is to refactor it to use the coercion API, uplifting the
API where necessary to make this possible. This simplifies the signature
in code, whilst not losing flexibility.

Also other minor refactors are included to log.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

New `TypeSignatureClass` variants: Float, Decimal & Numeric

Refactor `log()` signature to be more in line with it's supported
implementations.

Fix issue in `log()` where `ColumnarValue::Scalar`s were being lost as
`ColumnarValue::Array`s for the base.

Support null propagation in `simplify()` for `log()`.

~~Fix issue with `calculate_binary_math` where it wasn't casting
scalars.~~

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added new tests.

- Tests for float16, decimal32, decimal64, decimals with different
scales/precisions
- Test for null propagation (ensure use array input to avoid function
inlining)

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
…LargeListView` types (apache#18432)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18351

## Rationale for this change

`array_slice` accepts `ListView` / `LargeListView` inputs.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?


- Extend array_slice_inner to handle `ListView`/`LargeListView` arrays
directly.
- Share the stride/bounds logic between list and list‑view
implementations via a new `SlicePlan`.



<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Yes. `array_slice` now accepts `ListView` and `LargeListView` arrays
without requiring an explicit cast.
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- part of #apache#18142.

## Rationale for this change
This PR is for consolidating all the `builtin-functions` examples into a
single example binary. We are agreed on the pattern and we can apply it
to the remaining examples

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
pull Bot pushed a commit that referenced this pull request Nov 8, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Related apache#18210 

## Rationale for this change

To keep logic clear in binary operator and make it possible to use
binary operators for nested data structures in coming changes.

## What changes are included in this PR?

Another housekeeping refactor for binary operators.

- Keep the API from datum module consistent by using `Operator` instead
of kernel function
- Move nested data structure check into cmp operators. This allows us to
implement binary operators for `List`, `Struct` and etc.

## Are these changes tested?

Unit tests

## Are there any user-facing changes?

N/A
pull Bot pushed a commit that referenced this pull request Nov 9, 2025
…rnal invariant checks (apache#18511)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#15492

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
See issue for the rationale and example.

This PR introduces the following macros to make invariant checks and
throwing internal errors easier, and also let the error message include
more assertion details if it failed (what's the expected/actual value),
to make debugging easier.
- `assert_or_internal_err!()`
- `assert_eq_or_internal_err!()`
- `assert_ne_or_internal_err!()`

```rust
// before
if field.name() != expected.name() {
    return internal_err!(
        "Field name mismatch at index {}: expected '{}', found '{}'",
        idx,
        expected.name(),
        field.name()
    );
}

// after
assert_eq_or_internal_err!(
    field.name(),
    expected.name(),
    "Field name mismatch at index {}",
    idx
);
```
If the assertion fails, the error now reads:

```
Internal error: Assertion failed: field.name() == expected.name() (left: "foo", right: "bar"): Field name mismatch at index 3.
```


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
1. Add macros and UTs to test
2. Updated a few internal error patterns that are applicable for this
macro

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
3. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
UTs

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
pull Bot pushed a commit that referenced this pull request Nov 9, 2025
…ion-physical-expr` (apache#18557)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18544.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

See apache#18503 for details.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

I enabled the clippy lint rule and then fixed nearly all instances.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

As part of the normal test suite, yes.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

The following `pub (crate)` APIs were changed:

- `regex_match_dyn` in
`datafusion/physical-expr/src/expressions/binary/kernels.rs`
- `regex_match_dyn_scalar` in
`datafusion/physical-expr/src/expressions/binary/kernels.rs`

But no fully `pub` functions were changed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants