Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Conversation

@ptrendx
Copy link
Member

@ptrendx ptrendx commented Oct 8, 2019

Description

Fixes #16338. Adds launch bounds around the reduce_kernel_M1 kernels.

@reminisce Please verify the fix.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@ptrendx ptrendx requested a review from reminisce October 8, 2019 20:14
@ptrendx
Copy link
Member Author

ptrendx commented Oct 22, 2019

@reminisce ping to verify the fix

@ptrendx
Copy link
Member Author

ptrendx commented Oct 24, 2019

@hgt312 Could you help verifying that this fixes the issue with "too many resources requested for launch" when building in DEBUG mode?

@hgt312
Copy link
Contributor

hgt312 commented Oct 25, 2019

@ptrendx Yes, it fixed that.

@ptrendx
Copy link
Member Author

ptrendx commented Oct 25, 2019

@hgt312 Thanks :-)!

Copy link
Contributor

@DickJC123 DickJC123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, in ./mshadow/cuda/tensor_gpu-inl.cuh: const int kMaxThreadsPerBlock = 1024; Valid since GPU arch 2.0.

LGTM.

@DickJC123 DickJC123 merged commit 979e610 into apache:master Oct 31, 2019
yajiedesign pushed a commit to yajiedesign/mxnet that referenced this pull request Nov 6, 2019
* Added launch bounds to the reduce_kernel_M1

* Trigger CI

* Reretrigger the CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce op throws "too many resources requested for launch"

3 participants