ARROW-13882: [C++] Improve min_max/hash_min_max type support#11111
ARROW-13882: [C++] Improve min_max/hash_min_max type support#11111lidavidm wants to merge 5 commits intoapache:masterfrom
Conversation
|
In light of the proposal to clarify how DayTime intervals are handled some changes here should probably be reverted. |
234913b to
f49f7cf
Compare
|
(Removed support for DayTimeInterval since it's only partially orderable.) |
pitrou
left a comment
There was a problem hiding this comment.
Very nice, thank you. Just a couple questions and nits.
cpp/src/arrow/array/validate.cc
Outdated
There was a problem hiding this comment.
I'm a bit surprised. This should already be checked by the ValidateArray top-level function, no?
There was a problem hiding this comment.
Hmm. Odd, I do remember getting a crash for this reason here. But there is indeed a check:
arrow/cpp/src/arrow/array/validate.cc
Lines 367 to 374 in 87b2fcd
There was a problem hiding this comment.
It seems this will generate different template specializations for types with the same underlying c_type. Is it desirable?
(note that GetValues is called with the output type independently from the ArrowType parameter)
There was a problem hiding this comment.
I changed this and the one below to use PhysicalType. I also replaced GetValues with MakeScalar (since now the static type doesn't match the runtime type).
There was a problem hiding this comment.
Should this be else if? Presumably value cannot be bother smaller than the min and greater than the max, and it would spare a string comparison.
There was a problem hiding this comment.
Same remark as for the scalar aggregations: this will generate a separate kernel per logical type?
There was a problem hiding this comment.
Hmm, I don't see any interval in the type definition above. Is this comment outdated?
There was a problem hiding this comment.
I fixed the comments/docs.
docs/source/cpp/compute.rst
Outdated
docs/source/cpp/compute.rst
Outdated
e9087e0 to
c05cb96
Compare
This adds support for non-nested types to both, except for MonthDayNanoInterval (which is not really sortable). hash_min_max additionally lacks binary/string-like types as they require a different approach (I will file a followup). Closes apache#11111 from lidavidm/arrow-13882 Lead-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This adds support for non-nested types to both, except for MonthDayNanoInterval (which is not really sortable). hash_min_max additionally lacks binary/string-like types as they require a different approach (I will file a followup).