Add NAN handling to convert() needed for some prefix routines with integer outputs.#502
Conversation
…Ns when converting from complex/float to ints. Eager currently incomplete. On C++ side need to add OP handling to allow templatization on NAN identity values (for performance reasons).
Remove old comments.
…o identity based on operation. Not tested and likely buggy!
Bugfixes. Changed the prefix routines to use the convert with NAN handling. Remove unnecessary code from convert. Added nan handling to eager variant of convert. Intial testing seems to work correctly.
Modified the test code for scan routines, enabling int type outputs on float/complex inputs with NAN handling. Modified the scan test code's parametrization. Removed commented out unnecessary code from convert. pre-commit fixes.
Change input ranges to avoid overflows resulting in NANs (NANs are still tested based on n0)
…nge to avoid overflows.
| type_dispatch(args.in.code(), SourceTypeDispatch<KIND>{}, args); | ||
| ConvertArgs args{ | ||
| context.outputs()[0], context.inputs()[0], context.scalars()[0].value<ConvertCode>()}; | ||
| op_dispatch(args.nan_op, ConvertDispatch<KIND>{}, args); |
There was a problem hiding this comment.
Doing this dispatch upfront triples the number of template instantiations, even though the dispatch on nan_op is unnecessary when the source has an integer type (and please remember there are more integer types than floating point types and complex types combined). A more desirable implementation would be to instantiate a special conversion logic only for pairs of types that need it. You can express those pairs using a template like this:
template <LegateTypeCode SRC_TYPE, LegateTypeCode DST_TYPE>
using needs_dispatch_on_nan_op =
(legate::is_floating_type(SRC_TYPE)::value || legate::is_complex_type(SRC_TYPE)::value)
&& legate::is_integer_type(DST_TYPE)::value;
Then you move the dispatch on nan_op to the innermost template and do it only when needs_dispatch_on_nan_op is true.
There was a problem hiding this comment.
Resolved in multiple commits, completed fix in d20450a
…sabling unnecessary templates when input is not float/complex (to be disabled in a future commit)
Adjusted nancumsum/nancumprod implementation to switch to the faster cumsum/cumprod if NAN conversion is already handled by convert.
With the change to convert's templatization it's needed (and beneficial) to reroute nancumsum/nancumprod to cumsum/cumprod at python level before convert is called for non-float/complex types. Modified test to cover nancumsum/nancumprod for non-float/complex input types to catch any potential bugs (needed due to how convert's templatization is now done).
|
LGTM. feel free to merge it once you fix the merge conflict |
* Initial pass * todos.rst * Address comments * Fix warnings * Update product positioning * Add supported platform info * Move all Jupyter instructions to Legate * more warnings * Remove todos --------- Co-authored-by: Manolis Papadakis <mpapadakis@nvidia.com>
* Initial pass * todos.rst * Address comments * Fix warnings * Update product positioning * Add supported platform info * Move all Jupyter instructions to Legate * more warnings * Remove todos --------- Co-authored-by: Manolis Papadakis <mpapadakis@nvidia.com>
No description provided.