Skip to content

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

levendlee
Copy link
Member

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

Copy link

netlify bot commented Apr 24, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 046f3f4
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/680eb7826eb198000827af66
😎 Deploy Preview https://deploy-preview-4016--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@levendlee levendlee force-pushed the export-D73602755 branch 2 times, most recently from a783b2b to ab6f083 Compare April 24, 2025 20:53
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@levendlee levendlee force-pushed the export-D73602755 branch 2 times, most recently from 6e35e7b to 6523081 Compare April 25, 2025 22:56
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead.
Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead.
Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead.
Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants