LLaMA.cpp returns just some weirdo texts with any model size #291

gotzmann · 2023-03-19T12:05:42Z

I'm grokking with LLaMA.cpp on M1 laptop with 32GB RAM. Somehow the inference is broken for me.

Like I'm expecting something reasonable for simple prompt I've got from original LLaMA examples:

SQL code to create a table, that will keep CD albums data, such as album name and track\n\\begin{code}\n

And LLaMA.cpp returns just some weirdo texts with any model size (7B, 13B, 30B quantised down to 4bit).

What's the reason here?

The text was updated successfully, but these errors were encountered:

Green-Sky · 2023-03-19T12:24:36Z

Things you can do:

check your model files. Document check sums of models so that we can confirm issues are not caused by bad downloads or conversion #238
always share your exact command line parameters.

ukiyocode · 2023-03-19T13:39:03Z

Might be the same issue as this: #280
Best thing to do right now is to download this version: https://github.com/ggerganov/llama.cpp/tree/4f546091102a418ffdc6230f872ac56e5cedb835 or earlier

ggerganov · 2023-03-19T16:57:26Z

Also, do not use \n in the prompt in the command line. These are not converted to new lines, but are instead parsed as normal text.

Either pass the prompt from a file, or do it like this:

make -j && ./main -m models/7B/ggml-model-q4_0.bin -t 8 -n 1024 -s 2 -p "SQL code to create a table, that will keep CD albums data, such as album name and track:
\begin{code}
"
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX:      Apple clang version 14.0.0 (clang-1400.0.29.202)

make: Nothing to be done for `default'.
main: seed = 2
llama_model_load: loading model from 'models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: prompt: ' SQL code to create a table, that will keep CD albums data, such as album name and track:
\begin{code}
'
main: number of tokens in prompt = 29
     1 -> ''
  3758 -> ' SQL'
   775 -> ' code'
   304 -> ' to'
  1653 -> ' create'
   263 -> ' a'
  1591 -> ' table'
 29892 -> ','
   393 -> ' that'
   674 -> ' will'
  3013 -> ' keep'
  7307 -> ' CD'
 20618 -> ' albums'
   848 -> ' data'
 29892 -> ','
  1316 -> ' such'
   408 -> ' as'
  3769 -> ' album'
  1024 -> ' name'
   322 -> ' and'
  5702 -> ' track'
 29901 -> ':'
    13 -> '
'
 29905 -> '\'
   463 -> 'begin'
 29912 -> '{'
   401 -> 'code'
 29913 -> '}'
    13 -> '
'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


 SQL code to create a table, that will keep CD albums data, such as album name and track:
\begin{code}
CREATE TABLE AlBums (album_id INT NOT NULL PRIMARY KEY AUTOINCREMENT ,artist VARCHAR(250) ) ;
INSERT INTO Albums VALUES('13', 'AC/DC'), ('486','Joe Cocker'); /* etc */;// The data goes here. As you can see it's quite simple, and you don't need to specify the columns of course - that is done by designers during development phase.
\end{code} [end of text]


main: mem per token = 14434244 bytes
main:     load time =   945.16 ms
main:   sample time =    75.87 ms
main:  predict time =  6337.46 ms / 48.38 ms per token
main:    total time =  7742.50 ms

Please reopen if the issue persists.

gjmulder added need more info The OP should provide more details about the issue generation quality Quality of model output labels Mar 19, 2023

gjmulder changed the title ~~Something is broken~~ LLaMA.cpp returns just some weirdo texts with any model size Mar 19, 2023

ggerganov closed this as completed Mar 19, 2023

mqy added a commit to mqy/llama.cpp that referenced this issue Jun 27, 2023

work stealing chunked task allocator example for issue ggml-org#291

b37574d

mqy added a commit to mqy/llama.cpp that referenced this issue Jun 27, 2023

work stealing chunked task allocator example for issue ggml-org#291

b1d402d

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaMA.cpp returns just some weirdo texts with any model size #291

LLaMA.cpp returns just some weirdo texts with any model size #291

gotzmann commented Mar 19, 2023

Green-Sky commented Mar 19, 2023

ukiyocode commented Mar 19, 2023

ggerganov commented Mar 19, 2023

LLaMA.cpp returns just some weirdo texts with any model size #291

LLaMA.cpp returns just some weirdo texts with any model size #291

Comments

gotzmann commented Mar 19, 2023

Green-Sky commented Mar 19, 2023

ukiyocode commented Mar 19, 2023

ggerganov commented Mar 19, 2023