-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[falcon] Fix Falcon for rw-1b model #2887
Conversation
I'm taking a wild guess here and thinking that After updating gguf to account for an extra post-attention layer in the tensor map for falcon, I'm successfully able to convert the model. Let me try running it now |
Here is the output I see after quantizing + running the model:
|
I have no idea why, but it looks like the tokenizer is missing some tokens. This model also seems to use alibi, so it will probably require some changes to the computation graph as well. |
This is really bizarre. In the conversion script, I wonder if taking the stated vocab size would at least fix this issue. I can look into the alibi stuff next |
That fixed the vocab issue. Now:
I think this is probably due to how we expected to have reshaped the qkv tensor which we skip for this model let me play around with this |
Alright, latest update. I actually got the model running and outputting something (nonsense for now):
This model has a bunch of extra |
convert-falcon-hf-to-gguf.py
for rw models
It might be more difficult to support these models than I initially imagined. Here is the reference implmentation: Here is how As @slaren mentioned, these models also use |
@ggerganov is there generally some kind of mapping of the operations in the python implementation to what we have available in ggml? Do you see anything missing? |
The needed operators (like alibi) should already be available in It's mostly a matter of correctly building the graphs depending on the config parameters. |
See #2868
config.json
:config.json
vocab size does not match contents oftokenizer.json
- going with the stated size and padding with the missing tokens)ffm-norm
layer.error loading model: create_tensor: tensor 'blk.0.attn_qkv.weight' has wrong shape; expected 2048, 2176, got 2048, 6144