Qwen2 MoE manual `head_dim` #36659

yunju63 · 2025-03-12T07:43:54Z

Line 317 in 81aa9b2

self.head_dim = self.hidden_size // self.num_heads

For qwen2 moe, head_dim is now forced to be hidden_size // num_heads.

manual head_dim setting support in llama, mistal, mixtral modeling

PR

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-12T12:28:11Z

hi @yunju63, won't the reshape in the attention layer fail if we set self.head_dim to any value other than self.hidden_size // self.num_heads?

yunju63 added the Feature request Request for a new feature label Mar 12, 2025

Provide feedback