Unable to convert Smaug 72B #5807

schmorp · 2024-03-01T07:56:24Z

I am unable to convert https://huggingface.co/abacusai/Smaug-72B-v0.1 (and others) to GGUF with either convert.py or convert-hf-to-gguf.py.

With the former, I get:

RuntimeError: Internal: ./src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

"internal" feels like a bug. When I add --vocab-type hfft (and then --pad-vocab because it tells me to), I get a nonfunctional model:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 421/152064 vs 214/152064 ).

and convert-hf-to-gguf.py does not support "LlamaForCausalLM".

schmorp · 2024-03-01T08:25:18Z

llama.cpp release b2291 btw.

dranger003 · 2024-03-01T15:02:32Z

I think it works by changing this line:
https://github.com/ggerganov/llama.cpp/blob/e7433867288d2f142cffe596f3751bda5d7ee2c7/convert-hf-to-gguf.py#L262 to:

if arch in ("MixtralForCausalLM", "LlamaForCausalLM"):

schmorp · 2024-03-02T13:08:23Z

Thanks a lot, but that just fails further down the line:

Error: Missing Smaug-72B-v0.1/tokenizer.model

I notice quite a lot of models on huggingface were apparently convertible a few weeks/months ago but no longer are (e..g TheBloke has GGUFs, but when I try converting with current versions of llama to make imatrix quants, they fail in lots of different ways). Is this considered a regression and should issues be created for those?

dranger003 · 2024-03-02T13:59:09Z

Ah yes, I think I recreated it by loading the model using HF transformers and using save_pretrained() although I also think this just about to get fixed as I look at this PR #5821.
You could also use the tokenizer model from the model it was trained from, if that is available.

schmorp · 2024-03-05T06:24:47Z

It seems most of the llama-2 derived models seem to have this vocabulary mismatch problem, and #5821 does not seem to help.

christiandaley · 2024-03-12T02:39:46Z

Any updates/workarounds on this? I just tried to convert Smaug 72B and I'm getting a error loading model: _map_base::at error when I try to run using llama.cpp.

countzero · 2024-03-12T09:37:35Z

@schmorp, @dranger003 & @christiandaley

I am using the latest version of https://huggingface.co/abacusai/Smaug-72B-v0.1 and the llama.cpp release b2405.

Solution:

Smaug-72B-v0.1 is based on a model that uses a Byte Pair Encoding (BPE) vocabulary type. The conversion can be done with:

python convert.py --vocab-type "bpe" --pad-vocab --outfile ./models/Smaug-72B-v0.1.gguf /path/to/repository

Server Log:

ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2060 SUPER, compute capability 7.5, VMM: yes
{"build":2405,"commit":"5cdb3717","function":"main","level":"INFO","line":2732,"msg":"build info","tid":"85172","timestamp":1710235396}
{"function":"main","level":"INFO","line":2739,"msg":"system info","n_threads":16,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"85172","timestamp":1710235396,"total_threads":32}
llama_model_loader: loaded meta data with 22 key-value pairs and 1043 tensors from .\vendor\llama.cpp\models\Smaug-72B-v0.1.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = R:\AI\LLM\source
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 8192
llama_model_loader: - kv 4: llama.block_count u32 = 80
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 24576
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 64
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 64
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 11: general.file_type u32 = 17
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,152064] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,151387] = ["─á ─á", "─á─á ─á─á", "i n", "─á t",...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 151643
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 481 tensors
llama_model_loader: - type q5_K: 481 tensors
llama_model_loader: - type q6_K: 81 tensors
llm_load_vocab: mismatch in special tokens definition ( 421/152064 vs 213/152064 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 152064
llm_load_print_meta: n_merges = 151387
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 64
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 8192
llm_load_print_meta: n_embd_v_gqa = 8192
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 24576
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attm = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 65B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 72.29 B
llm_load_print_meta: model size = 47.78 GiB (5.68 BPW)
llm_load_print_meta: general.name = R:\AI\LLM\source
llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token = 151643 '<|endoftext|>'
llm_load_print_meta: UNK token = 151643 '<|endoftext|>'
llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
llm_load_print_meta: LF token = 30 '?'
llm_load_tensors: ggml ctx size = 0.40 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/81 layers to GPU
llm_load_tensors: CPU buffer size = 48926.31 MiB
...................................................................................................
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA_Host KV buffer size = 20480.00 MiB
llama_new_context_with_model: KV self size = 20480.00 MiB, K (f16): 10240.00 MiB, V (f16): 10240.00 MiB
llama_new_context_with_model: CUDA_Host input buffer size = 33.07 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 1088.00 MiB
llama_new_context_with_model: graph splits (measure): 1
[1710235412] warming up the model with an empty run

Open Question:

Is the warning something to worry about?

llm_load_vocab: mismatch in special tokens definition ( 421/152064 vs 213/152064 ).

schmorp · 2024-03-12T14:45:25Z

I thought I'd have tried that, but maybe I haven't. Thanks a lot for this tip!

As for the special token deifnition warning, I had this for a few other models, and they seemed to work, but ymmv.

schmorp · 2024-03-12T19:32:54Z

Works, so it's a user error. Sorry for the noise.

schmorp · 2024-03-17T17:21:24Z

When converting with the command from @countzero, the resulting model crashes main/imatrix with:

terminate called after throwing an instance of 'std::out_of_range'
what(): unordered_map::at
Aborted

Sorry, @christiandaley, you actually reported this a week ago and I overlooked it.

github-actions · 2024-05-01T01:36:15Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

schmorp added the bug-unconfirmed label Mar 1, 2024

countzero mentioned this issue Mar 12, 2024

Error when converting safe tensors to gguf #5559

Closed

schmorp closed this as completed Mar 12, 2024

schmorp reopened this Mar 17, 2024

github-actions bot added the stale label Apr 17, 2024

github-actions bot closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to convert Smaug 72B #5807

Unable to convert Smaug 72B #5807

schmorp commented Mar 1, 2024

schmorp commented Mar 1, 2024

dranger003 commented Mar 1, 2024

schmorp commented Mar 2, 2024

dranger003 commented Mar 2, 2024 •

edited

Loading

schmorp commented Mar 5, 2024

christiandaley commented Mar 12, 2024

countzero commented Mar 12, 2024 •

edited

Loading

schmorp commented Mar 12, 2024

schmorp commented Mar 12, 2024

schmorp commented Mar 17, 2024 •

edited

Loading

github-actions bot commented May 1, 2024

Unable to convert Smaug 72B #5807

Unable to convert Smaug 72B #5807

Comments

schmorp commented Mar 1, 2024

schmorp commented Mar 1, 2024

dranger003 commented Mar 1, 2024

schmorp commented Mar 2, 2024

dranger003 commented Mar 2, 2024 • edited Loading

schmorp commented Mar 5, 2024

christiandaley commented Mar 12, 2024

countzero commented Mar 12, 2024 • edited Loading

schmorp commented Mar 12, 2024

schmorp commented Mar 12, 2024

schmorp commented Mar 17, 2024 • edited Loading

github-actions bot commented May 1, 2024

dranger003 commented Mar 2, 2024 •

edited

Loading

countzero commented Mar 12, 2024 •

edited

Loading

schmorp commented Mar 17, 2024 •

edited

Loading