-
Notifications
You must be signed in to change notification settings - Fork 568
Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D73602755 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
c2bbc70
to
282cf97
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
282cf97
to
6f57ffb
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
6f57ffb
to
fce075c
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D73602755 |
fce075c
to
807fd87
Compare
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
a783b2b
to
ab6f083
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
ab6f083
to
21b0fb0
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
21b0fb0
to
cf1e7e2
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
cf1e7e2
to
f630cde
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
f630cde
to
02cb1da
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
be3d44d
to
5cdceb5
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755
6e35e7b
to
6523081
Compare
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
6523081
to
51998cb
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
51998cb
to
3b68df2
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
3b68df2
to
4865e6d
Compare
…ytorch#4016) Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
4865e6d
to
5a56152
Compare
…ytorch#4016) Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
5a56152
to
513eb3a
Compare
…ytorch#4016) Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
513eb3a
to
2b1606e
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D73602755 |
…ytorch#4016) Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
2b1606e
to
8cb941b
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
8cb941b
to
d6a9b6f
Compare
…ytorch#4016) Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead. hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755
d6a9b6f
to
046f3f4
Compare
This pull request was exported from Phabricator. Differential Revision: D73602755 |
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1104
It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.
Differential Revision: D73602755