avoid propagation of NaN #3723

Aya-ZIbra · 2025-02-22T03:57:50Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/806

as title
Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage.

Reviewed By: jianyuh

Differential Revision: D69522001

facebook-github-bot · 2025-02-22T03:58:03Z

This pull request was exported from Phabricator. Differential Revision: D69522001

netlify · 2025-02-22T03:58:15Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`9e31670`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67ba912b472bdf0008abf688
😎 Deploy Preview	https://deploy-preview-3723--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: X-link: facebookresearch/FBGEMM#806 as title Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage. Reviewed By: jianyuh Differential Revision: D69522001

facebook-github-bot · 2025-02-22T17:52:59Z

This pull request was exported from Phabricator. Differential Revision: D69522001

Summary: X-link: facebookresearch/FBGEMM#806 as title Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage. Reviewed By: jianyuh Differential Revision: D69522001

facebook-github-bot · 2025-02-22T20:41:56Z

This pull request was exported from Phabricator. Differential Revision: D69522001

Summary: X-link: facebookresearch/FBGEMM#806 as title Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage. Reviewed By: jianyuh Differential Revision: D69522001

facebook-github-bot · 2025-02-23T03:08:35Z

This pull request was exported from Phabricator. Differential Revision: D69522001

sijiac · 2025-02-23T22:58:54Z

fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu

@@ -1919,23 +1928,27 @@ std::tuple<at::Tensor, at::Tensor> dequantize_fp8_cache(
    block_tables_b_stride = block_tables.value().stride(0);
  }

-  constexpr int32_t kMaxBlocks = 256;
+  constexpr int32_t kMaxBlocks = 512;


is this change for better performance (increased parallelism)?

sijiac · 2025-02-23T23:11:14Z

fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu

    // each thread writes 4 elements of type bf16
    *reinterpret_cast<uint2*>(&row_k_dq[4 * threadIdx.x]) =
        *reinterpret_cast<uint2*>(&kv_dq.vals[0]);
    *reinterpret_cast<uint2*>(&row_v_dq[4 * threadIdx.x]) =
        *reinterpret_cast<uint2*>(&kv_dq.vals[2]);
  }
+


nit, maybe add some comments to explain why we do the padding of the last tile?

sijiac · 2025-02-23T23:12:55Z

fbgemm_gpu/experimental/gen_ai/test/kv_cache/kv_cache_test.py

        or not HAS_XFORMERS,
-        "Skip when H100 is not available or MI300 is not available",
+        "Skip when H100 is not available",
    )
    def test_fp8_kv_cache(self, MAX_T: int, N_KVH_L: int) -> None:


nit, update the test to check the padding logic?

facebook-github-bot · 2025-02-24T01:38:19Z

This pull request has been merged in e97b388.

Summary: X-link: pytorch#3723 Pull Request resolved: facebookresearch/FBGEMM#806 as title Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage. Reviewed By: jianyuh Differential Revision: D69522001 fbshipit-source-id: 9ce8c1840be75c78727e952feb1fbb962c57543a

facebook-github-bot added cla signed fb-exported labels Feb 22, 2025

Aya-ZIbra force-pushed the export-D69522001 branch from 4afe692 to 6984077 Compare February 22, 2025 17:52

Aya-ZIbra force-pushed the export-D69522001 branch from 6984077 to 481519e Compare February 22, 2025 20:41

avoid propagation of NaN (pytorch#3723)

9e31670

Summary: X-link: facebookresearch/FBGEMM#806 as title Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage. Reviewed By: jianyuh Differential Revision: D69522001

Aya-ZIbra force-pushed the export-D69522001 branch from 481519e to 9e31670 Compare February 23, 2025 03:08

sijiac approved these changes Feb 23, 2025

View reviewed changes

facebook-github-bot closed this in e97b388 Feb 24, 2025

facebook-github-bot added the Merged label Feb 24, 2025

q10 added category:improvement feature:genai labels Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid propagation of NaN #3723

avoid propagation of NaN #3723

Aya-ZIbra commented Feb 22, 2025

facebook-github-bot commented Feb 22, 2025

netlify bot commented Feb 22, 2025 •

edited

Loading

facebook-github-bot commented Feb 22, 2025

facebook-github-bot commented Feb 22, 2025

facebook-github-bot commented Feb 23, 2025

sijiac Feb 23, 2025

sijiac Feb 23, 2025

sijiac Feb 23, 2025

facebook-github-bot commented Feb 24, 2025

avoid propagation of NaN #3723

avoid propagation of NaN #3723

Conversation

Aya-ZIbra commented Feb 22, 2025

facebook-github-bot commented Feb 22, 2025

netlify bot commented Feb 22, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Feb 22, 2025

facebook-github-bot commented Feb 22, 2025

facebook-github-bot commented Feb 23, 2025

sijiac Feb 23, 2025

Choose a reason for hiding this comment

sijiac Feb 23, 2025

Choose a reason for hiding this comment

sijiac Feb 23, 2025

Choose a reason for hiding this comment

facebook-github-bot commented Feb 24, 2025

netlify bot commented Feb 22, 2025 •

edited

Loading