-
Notifications
You must be signed in to change notification settings - Fork 128
Description
Version
1.2.0
Version
13.2
Which installation method(s) does this occur on?
Pip, Source
Describe the bug.
The reduction codepath dispatches on is_float() which returns False for restricted floats, they're NumericDType not ArithmeticDType. couple of places in _ir/ops.py each have an unhandled RestrictedFloat branch that either asserts on integral check or selects integer comparison ops instead of float ones.
Even if you bypass the assert, ct.min/ct.max on negative-valued tf32 tiles it would use signed integer comparison instead of float comparison and return wrong values. Positive-only input would accidentally be correct.
The pattern is identical to #23 restricted float hits unhandled codepath. probably adding is_restricted_float branches wherever the dispatch currently only handles is_float would fix this.
Minimum reproducible example
import torch, cuda.tile as ct
@ct.kernel(occupancy=ct.ByTarget(sm_120=1), opt_level=3)
def k(X, Y, TILE_N: ct.Constant[int]):
x_tf32 = ct.astype(ct.load(X, index=(0,), shape=(TILE_N,), latency=1), ct.tfloat32)
result = ct.astype(ct.sum(x_tf32, axis=0, keepdims=True), ct.float32)
ct.store(Y, index=(0,), tile=result, latency=1)
x = torch.arange(1, 17, dtype=torch.float32, device='cuda')
y = torch.zeros(1, dtype=torch.float32, device='cuda')
ct.launch(torch.cuda.current_stream(), (1,), k, (x, y, 16))Relevant log output
TileInternalError: Internal error
line 4, col 28 in k: result = ct.astype(ct.sum(x_tf32, ...), ct.float32)Other/Misc.
test/test_reduction.py fails at import on a standard install ModuleNotFoundError: No module named 'cuda.tile_internal' (#67). The test that would have caught this, can't be run publicly.
cc @haijieg, if you guys can reproduce, happy to open a PR. The fix is probably just contained to _ir/ops.py.
Contributing Guidelines
- I agree to follow cuTile Python's contributing guidelines
- I have searched the open bugs and have found no duplicates for this bug report