Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2 MoE manual head_dim #36659

Open
yunju63 opened this issue Mar 12, 2025 · 1 comment
Open

Qwen2 MoE manual head_dim #36659

yunju63 opened this issue Mar 12, 2025 · 1 comment
Labels
Feature request Request for a new feature

Comments

@yunju63
Copy link

yunju63 commented Mar 12, 2025

Feature request

self.head_dim = self.hidden_size // self.num_heads

For qwen2 moe, head_dim is now forced to be hidden_size // num_heads.

Motivation

manual head_dim setting support in llama, mistal, mixtral modeling

Your contribution

PR

@yunju63 yunju63 added the Feature request Request for a new feature label Mar 12, 2025
@Rocketknight1
Copy link
Member

hi @yunju63, won't the reshape in the attention layer fail if we set self.head_dim to any value other than self.hidden_size // self.num_heads?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants