Wrong attention mask implementation in BiMultiHeadAttention #12351

xiuqhou · 2025-04-29T14:36:38Z

Describe the bug
In the implementation of BiMultiHeadAttention, a boolean attention_mask is filled with a small value to ignore unused elements:

https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/utils/vlfuse_helper.py#L200-#L201

            attention_mask = attention_mask.masked_fill(
                attention_mask == 0, -9e15)

            if attention_mask.size() != (bsz, 1, tgt_len, src_len):
                raise ValueError('Attention mask should be of '
                                 f'size {(bsz, 1, tgt_len, src_len)}')
            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len,
                                             src_len) + attention_mask
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len,
                                             src_len)

However, there are two mistakes:

In masked_fill, filling a non-zero float number into a boolean tensor will not convert its data type to torch.float automatically. Instead, it still returns a boolean tensor filled with True. When the attention_mask is added with attn_weights, all positions that should be ignored are instead added with 1 converted due to the filled True.
In attention_mask, a True value indicates that the corresponding position is not allowed to attend. Therefore, the index to fill should be attention_mask == True, not attention_mask == 0.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

The text was updated successfully, but these errors were encountered:

)

mm-assistant bot assigned hhaAndroid Apr 29, 2025

xiuqhou added a commit to xiuqhou/mmdetection that referenced this issue Apr 29, 2025

[Bug fix] attention_mask dtype in BiMultiHeadAttention (open-mmlab#12351

bea4fd1

)

xiuqhou mentioned this issue Apr 29, 2025

[Bug fix] attention_mask dtype in BiMultiHeadAttention #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong attention mask implementation in BiMultiHeadAttention #12351

Wrong attention mask implementation in BiMultiHeadAttention #12351

xiuqhou commented Apr 29, 2025

Wrong attention mask implementation in BiMultiHeadAttention #12351

Wrong attention mask implementation in BiMultiHeadAttention #12351

Comments

xiuqhou commented Apr 29, 2025