Skip to content

avoid propagation of NaN #3723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Aya-ZIbra
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/806

as title
Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage.

Reviewed By: jianyuh

Differential Revision: D69522001

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69522001

Copy link

netlify bot commented Feb 22, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 9e31670
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67ba912b472bdf0008abf688
😎 Deploy Preview https://deploy-preview-3723--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Aya-ZIbra added a commit to Aya-ZIbra/FBGEMM that referenced this pull request Feb 22, 2025
Summary:

X-link: facebookresearch/FBGEMM#806

as title
Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage.

Reviewed By: jianyuh

Differential Revision: D69522001
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69522001

Aya-ZIbra added a commit to Aya-ZIbra/FBGEMM that referenced this pull request Feb 22, 2025
Summary:

X-link: facebookresearch/FBGEMM#806

as title
Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage.

Reviewed By: jianyuh

Differential Revision: D69522001
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69522001

Summary:

X-link: facebookresearch/FBGEMM#806

as title
Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage.

Reviewed By: jianyuh

Differential Revision: D69522001
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69522001

@@ -1919,23 +1928,27 @@ std::tuple<at::Tensor, at::Tensor> dequantize_fp8_cache(
block_tables_b_stride = block_tables.value().stride(0);
}

constexpr int32_t kMaxBlocks = 256;
constexpr int32_t kMaxBlocks = 512;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change for better performance (increased parallelism)?

// each thread writes 4 elements of type bf16
*reinterpret_cast<uint2*>(&row_k_dq[4 * threadIdx.x]) =
*reinterpret_cast<uint2*>(&kv_dq.vals[0]);
*reinterpret_cast<uint2*>(&row_v_dq[4 * threadIdx.x]) =
*reinterpret_cast<uint2*>(&kv_dq.vals[2]);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, maybe add some comments to explain why we do the padding of the last tile?

or not HAS_XFORMERS,
"Skip when H100 is not available or MI300 is not available",
"Skip when H100 is not available",
)
def test_fp8_kv_cache(self, MAX_T: int, N_KVH_L: int) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, update the test to check the padding logic?

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e97b388.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3723

Pull Request resolved: facebookresearch/FBGEMM#806

as title
Introduce padding in dequantization kernel to avoid passing of NaNs to the output of FA3 in prefill stage.

Reviewed By: jianyuh

Differential Revision: D69522001

fbshipit-source-id: 9ce8c1840be75c78727e952feb1fbb962c57543a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants