Skip to content

Conversation

@theirix
Copy link
Contributor

@theirix theirix commented Jan 18, 2026

Which issue does this PR close?

Rationale for this change

A helper calculate_binary_math and UDFs relying on it could behave strangely if the scales of inputs and outputs are different. Original logic didn't fully handle it.

So let's introduce calculate_binary_math_decimal and calculate_binary_math_numeric functions with a proper handling of arguments of different scales and type casting for input and output arguments.

They supersede calculate_binary_math and calculate_binary_math_decimal because they have a slightly different functor signature that automatically passes the effective precision and scale (even if rescaled). The rest is compatible.

What changes are included in this PR?

  • New functions
  • Port existing UDFs to new functions

Are these changes tested?

  • Existing unit tests
  • SLTs

Are there any user-facing changes?

Older functions could be deprecated. Since they are a part of the public interface of datafusion-functions, I just placed a comment without a full-fledged deprecate macro. Up to discussion whether it should be used

Introduce calculate_binary_math_decimal and
calculate_binary_math_numeric functions
with a proper handling of arguments of different scales
and type casting.

They supersede calculate_binary_math and calculate_binary_math_decimal
due to having a slightly different functior signature with automatic
passing of effective precision and scale (even if rescaled).
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jan 18, 2026
@theirix theirix marked this pull request as ready for review January 18, 2026 16:40
@theirix
Copy link
Contributor Author

theirix commented Jan 18, 2026

Recent related changes: #18525 and #19384 . Epic #18889

@Jefffrey Jefffrey self-requested a review January 23, 2026 03:40
@Jefffrey
Copy link
Contributor

I'm having trouble understanding the rationale here; log, power and round at most have one decimal input, and only round preserves the decimal type whereas the others will return floats anyway. So all this handling for getting precision/scale of left/right inputs, adjusting the scale of the output decimal, seems unused?

Also for round we have a PR relating to altering precision/scale:

@theirix
Copy link
Contributor Author

theirix commented Jan 25, 2026

I'm having trouble understanding the rationale here; log, power and round at most have one decimal input, and only round preserves the decimal type whereas the others will return floats anyway. So all this handling for getting precision/scale of left/right inputs, adjusting the scale of the output decimal, seems unused?

For most of these functions, agree, only one argument is decimal, so we execute the decimal/non-decimal case with code for adjusting both scales untouched.

Overall, I can see the following benefits:

  • simplifying a caller code by duplicating non-obvious logic to this helper
  • for decimal/non-decimal case, it performs casting of input and output types, so parameter scale is not lost when operating on a native type
  • for decimal outputs, it removes the burden of setting the precision and scale on output
  • for array cases, it is hard to write manual code of scaling to output type (not always a default Decimal(38,10)), so cast_array_to does it for the caller
  • providing effective scale and precision of a type to the user-provided kernel without capturing it from arguments

So, symmetric functions like gcd/lcm (WIP), mod, div, etc benefit most from this PR.

From the first glance, a log function can be greatly simplified by dropping unscale_to_* calls in kernels and extra casting.

For pow, originally it was Dec x Float -> Dec, but since introducing a fallback to the float version from #19369 (still thinking when it is necessary), it is less relevant.

Also for round we have a PR relating to altering precision/scale:

* [fix: increase ROUND decimal precision to prevent overflow truncation #19926](https://github.com/apache/datafusion/pull/19926)

Yes, I discovered it recently. I am wondering whether it could also be simplified using this PR, since it handles different input and output precisions and scales.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve scale support for binary decimal operations

2 participants