GH-46063: [C++][Compute] Fix the issue that MinMax kernel emits -inf/inf for all-NaN input#48459
Conversation
|
|
5d9050b to
48b787c
Compare
|
Hi @pitrou , would you take a look? Thanks. |
pitrou
left a comment
There was a problem hiding this comment.
LGTM on the principle, but can we ensure that the hash-aggregate min-max has the same semantics?
That was close. The hash_min_max behaves differently. I'm fixing them now. Thanks for pointing this out! |
48b787c to
4894506
Compare
cpp/src/arrow/type_traits.h
Outdated
|
|
||
| /// @} | ||
|
|
||
| /// \addtogroup c-type-concepts |
There was a problem hiding this comment.
How about a nice cup of concepts?
There was a problem hiding this comment.
Can do that as a separate PR? #48590 is already open for it.
There was a problem hiding this comment.
Shall we revert all the C++20 stuff in this PR and redo it in a separate one? Or, we can define such concepts local in the cpp file, and only move them to public header in a coming PR for #48590 ?
There was a problem hiding this comment.
You can define them locally if that makes things easier for you?
There was a problem hiding this comment.
Thanks. Moved from public header to local cpp.
cpp/src/arrow/type_traits.h
Outdated
| // XXX: To be completed with more concepts as needed. | ||
|
|
||
| template <typename T> | ||
| concept CBooleanConcept = std::is_same_v<T, bool>; |
There was a problem hiding this comment.
Apparently std::same_as can be used, though I'm not sure whether it differs from std::is_same_v.
There was a problem hiding this comment.
Yes. C++20 std::same_as is preferred. Updated.
| template <CFloatingPointConcept CType> | ||
| struct MinMaxOp<CType> { | ||
| static constexpr CType min(CType a, CType b) { return std::fmin(a, b); } | ||
| static constexpr CType max(CType a, CType b) { return std::fmax(a, b); } |
There was a problem hiding this comment.
Do std::fmin/fmax actually accept util::Float16??
There was a problem hiding this comment.
No, but we don't either specialize for Float16 so it compiles. Please see my other comment.
| field("argument1", float32()), | ||
| field("argument2", float64()), | ||
| field("key", int64()), | ||
| }); |
There was a problem hiding this comment.
I think we also want a float16 column? :)
There was a problem hiding this comment.
The min/max kernel for half-float is not implemented so we are not able to test it.
The current half-float is not intact in terms of: 1) whether the type is in floating point category is inconsistent: e.g. type trait is_floating_type<HalfFloatType> is true but g_floating_types doesn't include it (that's why floating point kernels don't register for half-float). 2) Some std functions don't work with our half-float representation: e.g. std::is_nan/fmin/fmax, it won't compile if we try to add certain half-float kernels.
There was a problem hiding this comment.
Hmm, I see. Thanks for the clarification.
| this->AssertMinMaxIs("[5, -Inf, 2, 3, 4]", -INFINITY, 5, options); | ||
| this->AssertMinMaxIsNull("[5, null, 2, 3, 4]", options); | ||
| this->AssertMinMaxIsNull("[5, -Inf, null, 3, 4]", options); | ||
| this->AssertMinMaxIsNull("[NaN, null]", options); |
There was a problem hiding this comment.
Are these tests also executed with Float16 or is there a separate test for it?
There was a problem hiding this comment.
Same as my other comment.
pitrou
left a comment
There was a problem hiding this comment.
Just two small remarks, otherwise LGTM.
| // XXX: Consider making these concepts complete and moving to public header. | ||
|
|
||
| template <typename T> | ||
| concept CBooleanConcept = std::is_same_v<T, bool>; |
There was a problem hiding this comment.
Right, thanks! Updated.
There was a problem hiding this comment.
Right, thanks! Updated.
| std::floating_point<T> || std::is_same_v<T, util::Float16>; | ||
|
|
||
| template <typename T> | ||
| concept CDecimalConcept = std::is_same_v<T, Decimal32> || std::is_same_v<T, Decimal64> || |
There was a problem hiding this comment.
By the way, it seems CTypeTraits<Decimal32> et al. aren't defined, perhaps open a separate issue for that.
There was a problem hiding this comment.
Hmm, all decimal types are missing for CTypeTraits. #48740 filed.
|
@github-actions crossbow submit -g cpp -g python |
|
Revision: 461bd5c Submitted crossbow builds: ursacomputing/crossbow @ actions-e84e8aea1a |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit abbcd53. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
Rationale for this change
Our MinMax kernels emit -inf/inf for all-NaN input array, which doesn't make sense.
What changes are included in this PR?
Initialize the running min/max value from -inf/inf to NaN, so we can leverage the nice property that:
std::fmin/fmax(all-NaN) = NaNstd::fmin/fmax(NaN, non-NaN) = non-NaNAre these changes tested?
Test included.
Are there any user-facing changes?
None.