Add support for Utf8View for date/temporal codepaths#11518
Add support for Utf8View for date/temporal codepaths#11518alamb merged 3 commits intoapache:string-view2from
Conversation
| // TODO(aduffy): just return a manipulated Utf8View array by modifying the view only. | ||
| pub fn substr_view(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
| match args.len() { |
There was a problem hiding this comment.
I'm going to revert this since it's outside the scope of this PR.
Ideally, we'd be able to defer to the substring kernel in arrow_string, but that kernel needs to be fixed to support utf8views
|
I see @XiangpengHao that you've got some of this in #11517, I can rebase this after your PR merges |
Marking as draft as I think this is waiting on some other PRs |
|
Thanks @a10y Also, it might help to target this PR to the |
|
Hey! Just changing the base branch in the PR made a massive diff, but did some quick git surgery to cp these changes to be on top of existing string-view2 branch instead |
| DataType::Null => hash_null(random_state, hashes_buffer, rehash), | ||
| DataType::Boolean => hash_array(as_boolean_array(array)?, random_state, hashes_buffer, rehash), | ||
| DataType::Utf8 => hash_array(as_string_array(array)?, random_state, hashes_buffer, rehash), | ||
| DataType::Utf8View => hash_array(as_string_view_array(array)?, random_state, hashes_buffer, rehash), |
There was a problem hiding this comment.
this might be the only thing that is already present
|
Hey @alamb, I added some sqllogictest cases. However, they will fail currently, I'd forgotten this change is dependent on the fixes in apache/arrow-rs#6077. I was doing testing over in spiral bringing in both of these patchsets, and had not tested this one in isolation. |
No worries -- I'll try and get apache/arrow-rs#6077 reviewed and merged today |
|
Ok, sorry for the delay -- I just merged apache/arrow-rs#6077 I think if you updated the pin to Cargo.toml here https://github.com/apache/datafusion/blob/string-view2/Cargo.toml To be the appropriate commit in arrow-rs we could get the PR to pass |
|
Just rebased to get #11517 changes and to bump arrow-rs patch version. sqllogictests are passing for me locally now! |
|
Thank you very much @a10y I took the liberty of pushing some commits to this branch to get CI to pass. Also I think by doing so the CI will run automatically for this PR from now on (and will run automatically for you in any subsequent PRs) |
|
🚀 |
… some ClickBench queries (not on by default) (#11667) * Pin to pre-release version of arrow 52.2.0 * Update for deprecated method * Add a config to force using string view in benchmark (#11514) * add a knob to force string view in benchmark * fix sql logic test * update doc * fix ci * fix ci only test * Update benchmarks/src/util/options.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update datafusion/common/src/config.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * update tests --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Add String view helper functions (#11517) * add functions * add tests for hash util * Add ArrowBytesViewMap and ArrowBytesViewSet (#11515) * Update `string-view` branch to arrow-rs main (#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * merge * update cast * consistent dep * fix ci * add more tests * make doc happy * update new implementation * fix bug * avoid unused dep * update dep * update * fix cargo check * update doc * pick up the comments change again --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Enable `GroupValueBytesView` for aggregation with StringView types (#11519) * add functions * Update `string-view` branch to arrow-rs main (#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * merge * update cast * consistent dep * fix ci * avoid unused dep * update dep * update * fix cargo check * better group value view aggregation * update --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Initial support for regex_replace on `StringViewArray` (#11556) * initial support for string view regex * update tests * Add support for Utf8View for date/temporal codepaths (#11518) * Add StringView support for date_part and make_date funcs * run cargo update in datafusion-cli * cargo fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * GC `StringViewArray` in `CoalesceBatchesStream` (#11587) * gc string view when appropriate * make clippy happy * address comments * make doc happy * update style * Add comments and tests for gc_string_view_batch * better herustic * update test * Update datafusion/physical-plan/src/coalesce_batches.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * [Bug] fix bug in return type inference of `utf8_to_int_type` (#11662) * fix bug in return type inference * update doc * add tests --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Fix clippy * Increase ByteViewMap block size to 2MB (#11674) * better default block size * fix related test * Change `--string-view` to only apply to parquet formats (#11663) * use inferenced schema, don't load schema again * move config to parquet-only * update * update * better format * format * update * Implement native support StringView for character length (#11676) * native support for character length * Update datafusion/functions/src/unicode/character_length.rs --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Remove uneeded patches * cargo fmt --------- Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com> Co-authored-by: Xiangpeng Hao <me@haoxp.xyz> Co-authored-by: Andrew Duffy <a10y@users.noreply.github.com>
(targets string-view2)
Which issue does this PR close?
More Utf8View support, per #10918
What changes are included in this PR?
Implements Utf8View support for folllowing:
date_partfunctionmake_datefunctionAre these changes tested?
Currently just using existing tests, I've tested this on the https://github.com/spiraldb/vortex/ TPC-H benchmarks.
Are there any user-facing changes?
Makes several things that used to be errors not error.