This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Workaround problem with fusion in CUDA 9#17028
Merged
ptrendx merged 1 commit intoapache:masterfrom Dec 10, 2019
Merged
Conversation
DickJC123
approved these changes
Dec 10, 2019
Contributor
There was a problem hiding this comment.
This looks like a straightforward fix to the nvrtc problem. I'm sure it was tedious to add __device__ in the many places required. What might help the file size and code duplication is the macro:
// Create a fast-math function by appending an 'f' to the canonical name.
#define DEFINE_FASTMATH_FUNC(func) \
template <typename DType> \
__device__ inline DType func(const DType val) { \
return func ## f(val); \
}
DEFINE_FASTMATH_FUNC(sin)
... etc
This only applies to some of the many functions defined in fused_op-inl.h, so I approve this PR independent of this suggestion.
Contributor
|
Awesome stuff! Thank you @ptrendx and @DickJC123 - this puts CD back on track ^^ |
ptrendx
added a commit
to ptrendx/mxnet
that referenced
this pull request
Dec 10, 2019
ptrendx
added a commit
that referenced
this pull request
Dec 10, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #17020
The problem comes from the bug in how NVRTC in CUDA 9 handles the
default-deviceflag. That flag is supposed to mark all the functions in the file as__device__functions, but it should leave the functions decorated differently (like kernels decorated with__global__) alone. This is the behavior in CUDA 10+. In CUDA 9, however, this__device__attribute is applied to every function (including kernels), which is incompatible with__launch_bounds__()attribute that we use for kernels.This PR removes the usage of
default-deviceflag for NVRTC compilation and instead manually decorates all the required functions as__device__