Skip to content

Commit ebb4746

Browse files
committed
Use stated vocab size
1 parent 71d6975 commit ebb4746

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

convert-falcon-hf-to-gguf.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,9 @@ def parse_args() -> argparse.Namespace:
131131

132132
print("gguf: get gpt2 tokenizer vocab")
133133

134-
vocab_size = len(tokenizer_json["model"]["vocab"])
134+
# The number of tokens in tokenizer.json can differ from the expected vocab size.
135+
# This causes downstream issues with mismatched tensor sizes when running the inference
136+
vocab_size = hparams["vocab_size"]
135137

136138
# ref: https://github.com/cmp-nct/ggllm.cpp/blob/master/falcon_convert.py
137139
tokenizer = AutoTokenizer.from_pretrained(dir_model)

0 commit comments

Comments
 (0)