13b model on M1 with 8gb RAM very slow #1500
Unanswered
joseph6377
asked this question in
Q&A
Replies: 2 comments 2 replies
-
You see, 13b size is larger than your physical ram, so it be cache in virtual ram, thus slower. You should use 7b for high speed. |
Beta Was this translation helpful? Give feedback.
2 replies
-
Two things; |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I am bit of a noob here.
Running 4bit quantized models on M1 with 8gb RAM. When I run the 13B model it is very slow I have tried to set mlock as true as well. Any other parameters I need to tweak.
llama.cpp: loading model from /Users/jo/Documents/llama.cpp/models/wizard-mega-13B.ggml.q4_0.bin
llama_model_load_internal: format = ggjt v2 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 90.75 KB
llama_model_load_internal: mem required = 9807.48 MB (+ 1608.00 MB per state)
..............................................................warning: failed to mlock 44236800-byte buffer (after previously locking 5073518592 bytes): Resource temporarily unavailable
......................................
llama_init_from_file: kv self size = 400.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
Beta Was this translation helpful? Give feedback.
All reactions