Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

levendlee · 2025-04-24T18:44:04Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755

facebook-github-bot · 2025-04-24T18:44:13Z

This pull request was exported from Phabricator. Differential Revision: D73602755

netlify · 2025-04-24T18:44:24Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`046f3f4`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/680eb7826eb198000827af66
😎 Deploy Preview	https://deploy-preview-4016--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T20:07:56Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T20:09:57Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T20:21:27Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T21:00:12Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T21:07:44Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-25T14:27:06Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-25T22:59:59Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:06:09Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

…ytorch#4016) Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

…ytorch#4016) Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:26:23Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:26:27Z

This pull request was exported from Phabricator. Differential Revision: D73602755

…ytorch#4016) Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:34:14Z

This pull request was exported from Phabricator. Differential Revision: D73602755

…ytorch#4016) Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead. hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-27T23:02:34Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot added the cla signed label Apr 24, 2025

facebook-github-bot added the fb-exported label Apr 24, 2025

levendlee force-pushed the export-D73602755 branch from c2bbc70 to 282cf97 Compare April 24, 2025 20:07

levendlee force-pushed the export-D73602755 branch from 282cf97 to 6f57ffb Compare April 24, 2025 20:07

levendlee force-pushed the export-D73602755 branch from 6f57ffb to fce075c Compare April 24, 2025 20:07

levendlee force-pushed the export-D73602755 branch from fce075c to 807fd87 Compare April 24, 2025 20:10

levendlee force-pushed the export-D73602755 branch 2 times, most recently from a783b2b to ab6f083 Compare April 24, 2025 20:53

levendlee force-pushed the export-D73602755 branch from ab6f083 to 21b0fb0 Compare April 24, 2025 20:54

levendlee force-pushed the export-D73602755 branch from 21b0fb0 to cf1e7e2 Compare April 24, 2025 21:00

levendlee force-pushed the export-D73602755 branch from cf1e7e2 to f630cde Compare April 24, 2025 21:07

levendlee force-pushed the export-D73602755 branch from f630cde to 02cb1da Compare April 25, 2025 00:58

levendlee force-pushed the export-D73602755 branch from be3d44d to 5cdceb5 Compare April 25, 2025 14:18

levendlee force-pushed the export-D73602755 branch 2 times, most recently from 6e35e7b to 6523081 Compare April 25, 2025 22:56

levendlee force-pushed the export-D73602755 branch from 6523081 to 51998cb Compare April 25, 2025 22:57

levendlee force-pushed the export-D73602755 branch from 51998cb to 3b68df2 Compare April 25, 2025 23:00

levendlee force-pushed the export-D73602755 branch from 3b68df2 to 4865e6d Compare April 25, 2025 23:06

levendlee force-pushed the export-D73602755 branch from 4865e6d to 5a56152 Compare April 25, 2025 23:23

levendlee force-pushed the export-D73602755 branch from 5a56152 to 513eb3a Compare April 25, 2025 23:24

levendlee force-pushed the export-D73602755 branch from 513eb3a to 2b1606e Compare April 25, 2025 23:26

levendlee force-pushed the export-D73602755 branch from 2b1606e to 8cb941b Compare April 25, 2025 23:26

levendlee force-pushed the export-D73602755 branch from 8cb941b to d6a9b6f Compare April 25, 2025 23:34

levendlee force-pushed the export-D73602755 branch from d6a9b6f to 046f3f4 Compare April 27, 2025 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

levendlee commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

netlify bot commented Apr 24, 2025 •

edited

Loading

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 27, 2025

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

Are you sure you want to change the base?

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

Conversation

levendlee commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

netlify bot commented Apr 24, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 27, 2025

netlify bot commented Apr 24, 2025 •

edited

Loading