Fix the TBE cache_precision to fp32 when on ROCm #3672

q10 · 2025-02-10T18:37:58Z

Summary: It was discovered that FP16 cache precision caused a 500x slowdown in performance of split_embedding_nobag_backward_codegen_rowwise_adagrad_unweighted_kernel_warp_per_row_1 kernel on ROCm, so to work around this, we fix cache precision to be FP32 always for the ROCm environment case.

Differential Revision: D69130978

Summary: It was discovered that FP16 cache precision caused a 500x slowdown in performance of `split_embedding_nobag_backward_codegen_rowwise_adagrad_unweighted_kernel_warp_per_row_1` kernel on ROCm, so to work around this, we fix cache precision to be FP32 always for the ROCm environment case. Differential Revision: D69130978

pytorch-bot · 2025-02-10T18:38:05Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

facebook-github-bot · 2025-02-10T18:38:06Z

This pull request was exported from Phabricator. Differential Revision: D69130978

netlify · 2025-02-10T18:38:19Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`c6513f9`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67aa47884cc58600085c5f1b
😎 Deploy Preview	https://deploy-preview-3672--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-02-11T21:50:23Z

This pull request has been merged in fc718cf.

Summary: X-link: pytorch#3672 It was discovered that FP16 cache precision caused a 500x slowdown in performance of `split_embedding_nobag_backward_codegen_rowwise_adagrad_unweighted_kernel_warp_per_row_1` kernel on ROCm, so to work around this, we fix cache precision to be FP32 always for the ROCm environment case. Reviewed By: sryap Differential Revision: D69130978 fbshipit-source-id: ca904e54fc8446f5517fba5486d7497f49730fa4

pytorch-bot bot added ciflow/rocm module: rocm labels Feb 10, 2025

facebook-github-bot added the cla signed label Feb 10, 2025

facebook-github-bot added the fb-exported label Feb 10, 2025

facebook-github-bot closed this in fc718cf Feb 11, 2025

facebook-github-bot added the Merged label Feb 11, 2025

q10 added category:fix feature:tbe labels Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the TBE cache_precision to fp32 when on ROCm #3672

Fix the TBE cache_precision to fp32 when on ROCm #3672

q10 commented Feb 10, 2025

pytorch-bot bot commented Feb 10, 2025

facebook-github-bot commented Feb 10, 2025

netlify bot commented Feb 10, 2025 •

edited

Loading

facebook-github-bot commented Feb 11, 2025

Fix the TBE cache_precision to fp32 when on ROCm #3672

Fix the TBE cache_precision to fp32 when on ROCm #3672

Conversation

q10 commented Feb 10, 2025

pytorch-bot bot commented Feb 10, 2025

facebook-github-bot commented Feb 10, 2025

netlify bot commented Feb 10, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Feb 11, 2025

netlify bot commented Feb 10, 2025 •

edited

Loading