Add Preshuffled FP8 x INT4 Grouped Gemm Kernel #3800

jwfromm · 2025-03-11T18:02:55Z

Summary: Working on adding support for stacked mixed dtype grouped gemm with preshuffling.

Differential Revision: D70870933

facebook-github-bot · 2025-03-11T18:03:07Z

This pull request was exported from Phabricator. Differential Revision: D70870933

netlify · 2025-03-11T18:03:14Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`46acfbc`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67d48d40fe71510008d0e03f
😎 Deploy Preview	https://deploy-preview-3800--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: Working on adding support for stacked mixed dtype grouped gemm with preshuffling. Differential Revision: D70870933

Summary: Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

Summary: Working on adding support for stacked mixed dtype grouped gemm with preshuffling. Differential Revision: D70870933

facebook-github-bot · 2025-03-13T22:52:48Z

This pull request was exported from Phabricator. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

facebook-github-bot · 2025-03-13T22:55:44Z

This pull request was exported from Phabricator. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Pull Request resolved: pytorch#3800 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

facebook-github-bot · 2025-03-13T23:00:21Z

This pull request was exported from Phabricator. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

facebook-github-bot · 2025-03-13T23:02:51Z

This pull request was exported from Phabricator. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Pull Request resolved: pytorch#3800 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

Summary: Pull Request resolved: pytorch#3800 Working on adding support for stacked mixed dtype grouped gemm with preshuffling. Differential Revision: D70870933

facebook-github-bot · 2025-03-13T23:12:51Z

This pull request was exported from Phabricator. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Pull Request resolved: pytorch#3800 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#847 Pull Request resolved: pytorch#3766 One of the new interesting changes in the preshuffled F8I4 kernel is that group scales are downcast to FP8. This has the risk of running into dynamic range issues and impacting accuracy. We can mitigate this risk by adding FP32 columnwise scaling to the output. Fortunately, we can do this using EVT so the performance impact is negligible. Differential Revision: D70587477

Summary: X-link: facebookresearch/FBGEMM#855 Pull Request resolved: pytorch#3775 This diff introduces a set of quantization helper functions to fbgemm_gpu/experimental/gen_ai to make it easier to apply the new Int4 packing and preshuffling to weights. Differential Revision: D70643388 Reviewed By: summerdengfb

Summary: X-link: facebookresearch/FBGEMM#897 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Reviewed By: jiawenliu64 Differential Revision: D70870933

facebook-github-bot · 2025-03-14T20:06:53Z

This pull request was exported from Phabricator. Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Reviewed By: jiawenliu64 Differential Revision: D70870933

Summary: X-link: facebookresearch/FBGEMM#897 Pull Request resolved: pytorch#3800 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Reviewed By: jiawenliu64 Differential Revision: D70870933

Summary: Pull Request resolved: pytorch#3800 Working on adding support for stacked mixed dtype grouped gemm with preshuffling. Differential Revision: D70870933

facebook-github-bot · 2025-03-14T20:10:28Z

This pull request was exported from Phabricator. Differential Revision: D70870933

facebook-github-bot · 2025-03-17T21:05:43Z

This pull request has been merged in a39d2cc.

Summary: X-link: https://github.com/facebookresearch/FBGEMM/pull/897 Pull Request resolved: pytorch#3800 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Reviewed By: jiawenliu64 Differential Revision: D70870933 fbshipit-source-id: 195fb9feb993ffa7efe27b038173bd70a1db57ed

robertgshaw2-redhat · 2025-03-28T05:57:56Z

hey @jwfromm - thanks for the PR

What exactly the intended use of m_offsets?

Summary: Pull Request resolved: facebookresearch/FBGEMM#897 X-link: pytorch#3800 Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group. Reviewed By: jiawenliu64 Differential Revision: D70870933 fbshipit-source-id: 195fb9feb993ffa7efe27b038173bd70a1db57ed

facebook-github-bot added the cla signed label Mar 11, 2025

facebook-github-bot added the fb-exported label Mar 11, 2025

jwfromm force-pushed the export-D70870933 branch from 8c4ef51 to 071a184 Compare March 13, 2025 22:51

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Mar 13, 2025

Add Preshuffled FP8 x INT4 Grouped Gemm Kernel (pytorch#3800)

071a184

Summary: Working on adding support for stacked mixed dtype grouped gemm with preshuffling. Differential Revision: D70870933

jwfromm force-pushed the export-D70870933 branch from 071a184 to e86a553 Compare March 13, 2025 22:52

jwfromm force-pushed the export-D70870933 branch from e86a553 to 9391323 Compare March 13, 2025 22:52

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request Mar 13, 2025

Add Preshuffled FP8 x INT4 Grouped Gemm Kernel (pytorch#3800)

9391323

Summary: Working on adding support for stacked mixed dtype grouped gemm with preshuffling. Differential Revision: D70870933

jwfromm force-pushed the export-D70870933 branch from 9391323 to ad12177 Compare March 13, 2025 22:55

jwfromm force-pushed the export-D70870933 branch from ad12177 to d39247d Compare March 13, 2025 23:00

jwfromm force-pushed the export-D70870933 branch from d39247d to d9c8ad4 Compare March 13, 2025 23:00

jwfromm force-pushed the export-D70870933 branch from d9c8ad4 to 3fa97dc Compare March 13, 2025 23:02

jwfromm force-pushed the export-D70870933 branch from 3fa97dc to 7e88324 Compare March 13, 2025 23:12

jwfromm force-pushed the export-D70870933 branch from 7e88324 to 107e9f6 Compare March 14, 2025 20:06

jwfromm force-pushed the export-D70870933 branch from 107e9f6 to 864fb11 Compare March 14, 2025 20:07

jwfromm force-pushed the export-D70870933 branch from 864fb11 to 46acfbc Compare March 14, 2025 20:10

facebook-github-bot closed this in a39d2cc Mar 17, 2025

facebook-github-bot added the Merged label Mar 17, 2025

q10 added category:new feature:quantize feature:genai labels Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Preshuffled FP8 x INT4 Grouped Gemm Kernel #3800

Add Preshuffled FP8 x INT4 Grouped Gemm Kernel #3800

jwfromm commented Mar 11, 2025

facebook-github-bot commented Mar 11, 2025

netlify bot commented Mar 11, 2025 •

edited

Loading

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 14, 2025

facebook-github-bot commented Mar 14, 2025

facebook-github-bot commented Mar 17, 2025

robertgshaw2-redhat commented Mar 28, 2025

Add Preshuffled FP8 x INT4 Grouped Gemm Kernel #3800

Add Preshuffled FP8 x INT4 Grouped Gemm Kernel #3800

Conversation

jwfromm commented Mar 11, 2025

facebook-github-bot commented Mar 11, 2025

netlify bot commented Mar 11, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 13, 2025

facebook-github-bot commented Mar 14, 2025

facebook-github-bot commented Mar 14, 2025

facebook-github-bot commented Mar 17, 2025

robertgshaw2-redhat commented Mar 28, 2025

netlify bot commented Mar 11, 2025 •

edited

Loading