Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[User] Unable to Finetune Llama 2 70B #3644

Closed
4 tasks done
mrroll opened this issue Oct 16, 2023 · 14 comments
Closed
4 tasks done

[User] Unable to Finetune Llama 2 70B #3644

mrroll opened this issue Oct 16, 2023 · 14 comments
Labels

Comments

@mrroll
Copy link

mrroll commented Oct 16, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

fineunteing Llama 2 70B should succeed

Current Behavior

fineunteing Llama 2 70B fails with

ggml_allocr_alloc: not enough space in the buffer (needed 1048576000, largest block available 939524096)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
Aborted (core dumped)

I should add that finetuning Llama 2 13B works.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

System Memory
(base) user@server:/srv/shared$ free -h
               total        used        free      shared  buff/cache   available
Mem:           377Gi       2.1Gi       8.4Gi       1.0Mi       367Gi       373Gi
Swap:             0B          0B          0B
Physical (or virtual) hardware you are using, e.g. for Linux:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  96
  On-line CPU(s) list:   0-95
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.20GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  24
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4400.36
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflu
                         sh mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good
                          nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16
                          pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor 
                         lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced f
                         sgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx51
                         2dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsave
                         c xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   1.5 MiB (48 instances)
  L1i:                   1.5 MiB (48 instances)
  L2:                    48 MiB (48 instances)
  L3:                    77 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-23,48-71
  NUMA node1 CPU(s):     24-47,72-95
Vulnerabilities:         
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Not affected
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Mitigation; Enhanced IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequ
                         ence
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT Host state unknown
Operating System, e.g. for Linux:
Linux 6.2.0-1016-gcp #18~22.04.1-Ubuntu SMP Fri Sep 29 04:56:44 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
SDK version, e.g. for Linux:
Python 3.10.9
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Failure Information (for bugs)

ggml_allocr_alloc: not enough space in the buffer (needed 1048576000, largest block available 939524096)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
Aborted (core dumped)

Steps to Reproduce

The following steps assume that:

  • You have miniconda installed and the base environment is loaded.
  • You have access to the Llama 2 base model from Meta.
  • In addition, you have already downloaded the Llama 2 70B model and placed it in ./models/llama-2-70b.
  • You have downloaded shakespeare.txt and placed it in the root of the git repository.
  1. Clone llama.cpp.
  2. cd into the directory where llama.cpp was cloned.
  3. Run make
  4. Create a miniconda environment called llama.
conda create -yn llama python=3.10.9
  1. Switch to the miniconda environment you just created.
conda activate llama
  1. Install dependencies
pip install --upgrade --requirement requirements.txt
pip install --upgrade torch transformers
  1. Convert the Llama 2 70B model into the GGUF format.
python convert.py ./models/llama-2-70b
  1. Quantize the converted model
./quantize ./models/llama-2-70b/ggml-model-f16.gguf ./models/llama-2-70b/ggml-model-q8_0.gguf q8_0
  1. Attempt to finetune the model. For testing, I used this text file
./finetune \
    --model-base ./models/llama-2-70b/ggml-model-q8_0.gguf \
    --checkpoint-in llama-2-70b-shakespeare-LATEST.gguf \
    --checkpoint-out llama-2-70b-shakespeare-ITERATION.gguf \
    --lora-out llama-2-70b-shakespeare-ITERATION.bin \
    --train-data "./shakespeare.txt" \
    --save-every 10 \
    --threads 48 --adam-iter 150 --batch 4 --ctx 64 \
    --use-checkpointing

Failure Logs

The logs are too long to include as a comment. Instead, I am attaching them here. You'll also find that I ran a finetune on Llama 13B just to demonstrate that it's working.

error.log

@RedAndr
Copy link

RedAndr commented Oct 16, 2023

How much memory do you have?

@mrroll
Copy link
Author

mrroll commented Oct 16, 2023

How much memory do you have?

@RedAndr Sorry I completely forgot to add this in. I'm editing this into the original issue as well

(base) user@server:/srv/shared$ free -h
               total        used        free      shared  buff/cache   available
Mem:           377Gi       2.1Gi       8.4Gi       1.0Mi       367Gi       373Gi
Swap:             0B          0B          0B

@QueryType
Copy link

QueryType commented Oct 25, 2023

I am getting the similar error, on mac mini M2, 24GB or memory. Base model: openllama-3b-v2

main: lora_size = 54798560 bytes (52.3 MB)
main: opt_size  = 81693904 bytes (77.9 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 2785785440 bytes (2656.7 MB)
main: evaluation order = RIGHT_TO_LEFT
ggml_allocr_alloc: not enough space in the buffer (needed 409600000, largest block available 339804192)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
./train.sh: line 1:  1108 Abort trap: 6           /Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --train-data shakespeare.txt --lora-out lora.gguf --save-every 0 --threads 8 --ctx 256 --rope-freq-base 10000 --rope-freq-scale 1.0 --batch 1 --grad-acc 1 --adam-iter 256 --adam-alpha 0.001 --lora-r 4 --lora-alpha 4 --use-checkpointing --use-flash --sample-start "\n" --escape --include-sample-start --seed 1

@KerfuffleV2
Copy link
Collaborator

This doesn't directly help you, but the error isn't related to how much memory you have available. I believe it means that the maximum context size didn't get calculated correctly. So not a user error.

@QueryType
Copy link

This doesn't directly help you, but the error isn't related to how much memory you have available. I believe it means that the maximum context size didn't get calculated correctly. So not a user error.

Thanks for the response @KerfuffleV2 So I believe there's some bug to be fixed, but curious if it impacts everyone trying to use finetune or just some combination of input?

@mrroll
Copy link
Author

mrroll commented Oct 26, 2023

Hey folks! On 6961c4b, I'm getting a new error:

GGML_ASSERT: ggml-alloc.c:240: alloc->n_free_blocks < MAX_FREE_BLOCKS && "out of free blocks"

@QueryType
Copy link

I am getting the similar error, on mac mini M2, 24GB or memory. Base model: openllama-3b-v2

main: lora_size = 54798560 bytes (52.3 MB)
main: opt_size  = 81693904 bytes (77.9 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 2785785440 bytes (2656.7 MB)
main: evaluation order = RIGHT_TO_LEFT
ggml_allocr_alloc: not enough space in the buffer (needed 409600000, largest block available 339804192)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
./train.sh: line 1:  1108 Abort trap: 6           /Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --train-data shakespeare.txt --lora-out lora.gguf --save-every 0 --threads 8 --ctx 256 --rope-freq-base 10000 --rope-freq-scale 1.0 --batch 1 --grad-acc 1 --adam-iter 256 --adam-alpha 0.001 --lora-r 4 --lora-alpha 4 --use-checkpointing --use-flash --sample-start "\n" --escape --include-sample-start --seed 1

This issue comes when I have --batch 1, any other value works fine.

@mrroll
Copy link
Author

mrroll commented Oct 27, 2023

I am getting the similar error, on mac mini M2, 24GB or memory. Base model: openllama-3b-v2

main: lora_size = 54798560 bytes (52.3 MB)
main: opt_size  = 81693904 bytes (77.9 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 2785785440 bytes (2656.7 MB)
main: evaluation order = RIGHT_TO_LEFT
ggml_allocr_alloc: not enough space in the buffer (needed 409600000, largest block available 339804192)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
./train.sh: line 1:  1108 Abort trap: 6           /Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --train-data shakespeare.txt --lora-out lora.gguf --save-every 0 --threads 8 --ctx 256 --rope-freq-base 10000 --rope-freq-scale 1.0 --batch 1 --grad-acc 1 --adam-iter 256 --adam-alpha 0.001 --lora-r 4 --lora-alpha 4 --use-checkpointing --use-flash --sample-start "\n" --escape --include-sample-start --seed 1

This issue comes when I have --batch 1, any other value works fine.

@QueryType So you got it working? If so, what specific arguments did you use?

@QueryType
Copy link

I am getting the similar error, on mac mini M2, 24GB or memory. Base model: openllama-3b-v2

main: lora_size = 54798560 bytes (52.3 MB)
main: opt_size  = 81693904 bytes (77.9 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 2785785440 bytes (2656.7 MB)
main: evaluation order = RIGHT_TO_LEFT
ggml_allocr_alloc: not enough space in the buffer (needed 409600000, largest block available 339804192)
GGML_ASSERT: ggml-alloc.c:148: !"not enough space in the buffer"
./train.sh: line 1:  1108 Abort trap: 6           /Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --train-data shakespeare.txt --lora-out lora.gguf --save-every 0 --threads 8 --ctx 256 --rope-freq-base 10000 --rope-freq-scale 1.0 --batch 1 --grad-acc 1 --adam-iter 256 --adam-alpha 0.001 --lora-r 4 --lora-alpha 4 --use-checkpointing --use-flash --sample-start "\n" --escape --include-sample-start --seed 1

This issue comes when I have --batch 1, any other value works fine.

@QueryType So you got it working? If so, what specific arguments did you use?

finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --train-data shakespeare.txt --lora-out lora.gguf --save-every 0 --threads 10 --ctx 256 --batch 32 --use-checkpointing --use-flash --sample-start "\n" --escape

Yes, but it is too slow and fires up the CPU to 100 deg. C! Also swap got activated.

@mrroll
Copy link
Author

mrroll commented Oct 27, 2023

Nice! Thanks! I myself do not have an issue with lower parameter models like 13B, but 70B just doesn't want to start at all with the error:

GGML_ASSERT: ggml-alloc.c:240: alloc->n_free_blocks < MAX_FREE_BLOCKS && "out of free blocks"

@slaren
Copy link
Member

slaren commented Oct 27, 2023

You can probably workaround that problem by increasing MAX_FREE_BLOCKS in ggml-alloc.c. But we need a better long term solution, the value is already too big as it is.

@mrroll
Copy link
Author

mrroll commented Oct 30, 2023

You can probably workaround that problem by increasing MAX_FREE_BLOCKS in ggml-alloc.c. But we need a better long term solution, the value is already too big as it is.

I actually tried that previously -- increasing it to 512. I now tried increasing it to 1024, 2048, and 4096. On every change, I tried to run a finetune and end up with an error message terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

Log

/srv/shared/llama.cpp/finetune --model-base /srv/shared/llama.cpp/models/llama-2/llama-2-70b/ggml-model-q8_0.gguf --checkpoint-in llama-2-70b-finetune-data-LATEST.gguf --checkpoint-out llama-2-70b-finetune-data-ITERATION.gguf --lora-out llama-2-70b-finetune-data-ITERATION.bin --train-data /srv/shared/data.txt --save-every 50 --threads 48 --batch 2 --grad-acc 2 --adam-alpha 0.0003 --ctx 32 --sample-start '### Instruction:' --include-sample-start --no-checkpointing
main: seed: 1698665668
main: model base = '/srv/shared/llama.cpp/models/llama-2/llama-2-70b/ggml-model-q8_0.gguf'
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 4 CUDA devices:
Device 0: NVIDIA L4, compute capability 8.9
Device 1: NVIDIA L4, compute capability 8.9
Device 2: NVIDIA L4, compute capability 8.9
Device 3: NVIDIA L4, compute capability 8.9
llama_model_loader: loaded meta data with 16 key-value pairs and 723 tensors from /srv/shared/llama.cpp/models/llama-2/llama-2-70b/ggml-model-q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: token_embd.weight q8_0 [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 1: output_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 2: output.weight q8_0 [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 20: blk.1.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.2.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 37: blk.3.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.3.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 46: blk.4.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 47: blk.4.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 50: blk.5.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 51: blk.5.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 52: blk.5.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 53: blk.5.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 54: blk.5.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 55: blk.5.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 56: blk.5.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 57: blk.6.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 58: blk.6.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 59: blk.6.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 60: blk.6.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 61: blk.6.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 62: blk.6.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 63: blk.6.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 64: blk.6.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 65: blk.6.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 66: blk.7.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 67: blk.7.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 68: blk.7.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 69: blk.7.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 70: blk.7.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 71: blk.7.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 72: blk.7.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 73: blk.7.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 74: blk.7.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 75: blk.8.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 76: blk.8.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 77: blk.8.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 78: blk.8.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 79: blk.8.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 80: blk.8.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 81: blk.8.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 82: blk.8.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 83: blk.8.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 84: blk.9.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 85: blk.9.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 86: blk.9.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 87: blk.9.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 88: blk.9.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 89: blk.9.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 90: blk.9.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 91: blk.9.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 92: blk.9.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 93: blk.10.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 94: blk.10.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 95: blk.10.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 96: blk.10.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 97: blk.10.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 98: blk.10.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 99: blk.10.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 100: blk.10.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 101: blk.10.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 102: blk.11.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 103: blk.11.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 109: blk.11.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 110: blk.11.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 118: blk.12.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 119: blk.12.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 127: blk.13.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 128: blk.13.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 136: blk.14.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 137: blk.14.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 145: blk.15.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.15.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 154: blk.16.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 155: blk.16.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 163: blk.17.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 164: blk.17.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 172: blk.18.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.18.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 181: blk.19.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.19.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 190: blk.20.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.20.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 199: blk.21.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 200: blk.21.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 208: blk.22.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 209: blk.22.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 217: blk.23.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.23.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 226: blk.24.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 227: blk.24.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 235: blk.25.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.25.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 244: blk.26.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 245: blk.26.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 253: blk.27.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.27.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 262: blk.28.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 263: blk.28.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 271: blk.29.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 272: blk.29.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 277: blk.30.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 280: blk.30.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 281: blk.30.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 289: blk.31.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 290: blk.31.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 291: blk.32.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 292: blk.32.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 293: blk.32.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 294: blk.32.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 295: blk.32.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 296: blk.32.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 297: blk.32.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 298: blk.32.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 299: blk.32.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 300: blk.33.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 301: blk.33.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 302: blk.33.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 303: blk.33.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 304: blk.33.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 305: blk.33.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 306: blk.33.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 307: blk.33.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 308: blk.33.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 309: blk.34.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 310: blk.34.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 311: blk.34.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 312: blk.34.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 313: blk.34.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 314: blk.34.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 315: blk.34.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 316: blk.34.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 317: blk.34.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 318: blk.35.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 319: blk.35.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 320: blk.35.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 321: blk.35.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 322: blk.35.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 323: blk.35.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 324: blk.35.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 325: blk.35.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 326: blk.35.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 327: blk.36.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 328: blk.36.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 329: blk.36.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 330: blk.36.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 331: blk.36.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 332: blk.36.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 333: blk.36.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 334: blk.36.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 335: blk.36.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 336: blk.37.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 337: blk.37.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 338: blk.37.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 339: blk.37.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 340: blk.37.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 341: blk.37.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 342: blk.37.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 343: blk.37.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 344: blk.37.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 345: blk.38.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 346: blk.38.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 347: blk.38.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 348: blk.38.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 349: blk.38.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 350: blk.38.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 351: blk.38.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 352: blk.38.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 353: blk.38.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 354: blk.39.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 355: blk.39.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 356: blk.39.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 357: blk.39.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 358: blk.39.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 359: blk.39.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 360: blk.39.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 361: blk.39.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 362: blk.39.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 363: blk.40.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 364: blk.40.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 365: blk.40.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 366: blk.40.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 367: blk.40.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 368: blk.40.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 369: blk.40.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 370: blk.40.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 371: blk.40.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 372: blk.41.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 373: blk.41.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 374: blk.41.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 375: blk.41.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 376: blk.41.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 377: blk.41.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 378: blk.41.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 379: blk.41.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 380: blk.41.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 381: blk.42.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 382: blk.42.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 383: blk.42.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 384: blk.42.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 385: blk.42.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 386: blk.42.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 387: blk.42.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 388: blk.42.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 389: blk.42.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 390: blk.43.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 391: blk.43.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 392: blk.43.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 393: blk.43.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 394: blk.43.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 395: blk.43.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 396: blk.43.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 397: blk.43.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 398: blk.43.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 399: blk.44.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 400: blk.44.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 401: blk.44.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 402: blk.44.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 403: blk.44.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 404: blk.44.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 405: blk.44.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 406: blk.44.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 407: blk.44.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 408: blk.45.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 409: blk.45.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 410: blk.45.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 411: blk.45.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 412: blk.45.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 413: blk.45.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 414: blk.45.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 415: blk.45.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 416: blk.45.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 417: blk.46.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 418: blk.46.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 419: blk.46.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 420: blk.46.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 421: blk.46.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 422: blk.46.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 423: blk.46.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 424: blk.46.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 425: blk.46.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 426: blk.47.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 427: blk.47.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 428: blk.47.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 429: blk.47.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 430: blk.47.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 431: blk.47.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 432: blk.47.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 433: blk.47.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 434: blk.47.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 435: blk.48.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 436: blk.48.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 437: blk.48.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 438: blk.48.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 439: blk.48.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 440: blk.48.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 441: blk.48.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 442: blk.48.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 443: blk.48.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 444: blk.49.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 445: blk.49.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 446: blk.49.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 447: blk.49.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 448: blk.49.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 449: blk.49.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 450: blk.49.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 451: blk.49.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 452: blk.49.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 453: blk.50.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 454: blk.50.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 455: blk.50.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 456: blk.50.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 457: blk.50.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 458: blk.50.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 459: blk.50.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 460: blk.50.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 461: blk.50.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 462: blk.51.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 463: blk.51.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 464: blk.51.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 465: blk.51.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 466: blk.51.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 467: blk.51.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 468: blk.51.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 469: blk.51.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 470: blk.51.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 471: blk.52.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 472: blk.52.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 473: blk.52.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 474: blk.52.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 475: blk.52.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 476: blk.52.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 477: blk.52.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 478: blk.52.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 479: blk.52.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 480: blk.53.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 481: blk.53.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 482: blk.53.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 483: blk.53.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 484: blk.53.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 485: blk.53.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 486: blk.53.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 487: blk.53.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 488: blk.53.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 489: blk.54.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 490: blk.54.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 491: blk.54.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 492: blk.54.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 493: blk.54.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 494: blk.54.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 495: blk.54.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 496: blk.54.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 497: blk.54.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 498: blk.55.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 499: blk.55.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 500: blk.55.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 501: blk.55.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 502: blk.55.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 503: blk.55.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 504: blk.55.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 505: blk.55.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 506: blk.55.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 507: blk.56.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 508: blk.56.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 509: blk.56.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 510: blk.56.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 511: blk.56.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 512: blk.56.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 513: blk.56.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 514: blk.56.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 515: blk.56.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 516: blk.57.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 517: blk.57.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 518: blk.57.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 519: blk.57.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 520: blk.57.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 521: blk.57.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 522: blk.57.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 523: blk.57.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 524: blk.57.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 525: blk.58.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 526: blk.58.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 527: blk.58.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 528: blk.58.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 529: blk.58.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 530: blk.58.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 531: blk.58.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 532: blk.58.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 533: blk.58.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 534: blk.59.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 535: blk.59.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 536: blk.59.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 537: blk.59.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 538: blk.59.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 539: blk.59.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 540: blk.59.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 541: blk.59.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 542: blk.59.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 543: blk.60.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 544: blk.60.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 545: blk.60.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 546: blk.60.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 547: blk.60.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 548: blk.60.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 549: blk.60.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 550: blk.60.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 551: blk.60.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 552: blk.61.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 553: blk.61.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 554: blk.61.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 555: blk.61.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 556: blk.61.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 557: blk.61.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 558: blk.61.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 559: blk.61.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 560: blk.61.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 561: blk.62.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 562: blk.62.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 563: blk.62.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 564: blk.62.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 565: blk.62.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 566: blk.62.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 567: blk.62.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 568: blk.62.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 569: blk.62.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 570: blk.63.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 571: blk.63.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 572: blk.63.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 573: blk.63.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 574: blk.63.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 575: blk.63.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 576: blk.63.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 577: blk.63.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 578: blk.63.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 579: blk.64.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 580: blk.64.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 581: blk.64.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 582: blk.64.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 583: blk.64.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 584: blk.64.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 585: blk.64.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 586: blk.64.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 587: blk.64.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 588: blk.65.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 589: blk.65.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 590: blk.65.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 591: blk.65.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 592: blk.65.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 593: blk.65.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 594: blk.65.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 595: blk.65.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 596: blk.65.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 597: blk.66.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 598: blk.66.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 599: blk.66.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 600: blk.66.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 601: blk.66.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 602: blk.66.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 603: blk.66.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 604: blk.66.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 605: blk.66.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 606: blk.67.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 607: blk.67.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 608: blk.67.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 609: blk.67.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 610: blk.67.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 611: blk.67.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 612: blk.67.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 613: blk.67.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 614: blk.67.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 615: blk.68.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 616: blk.68.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 617: blk.68.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 618: blk.68.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 619: blk.68.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 620: blk.68.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 621: blk.68.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 622: blk.68.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 623: blk.68.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 624: blk.69.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 625: blk.69.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 626: blk.69.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 627: blk.69.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 628: blk.69.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 629: blk.69.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 630: blk.69.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 631: blk.69.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 632: blk.69.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 633: blk.70.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 634: blk.70.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 635: blk.70.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 636: blk.70.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 637: blk.70.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 638: blk.70.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 639: blk.70.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 640: blk.70.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 641: blk.70.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 642: blk.71.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 643: blk.71.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 644: blk.71.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 645: blk.71.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 646: blk.71.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 647: blk.71.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 648: blk.71.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 649: blk.71.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 650: blk.71.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 651: blk.72.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 652: blk.72.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 653: blk.72.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 654: blk.72.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 655: blk.72.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 656: blk.72.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 657: blk.72.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 658: blk.72.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 659: blk.72.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 660: blk.73.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 661: blk.73.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 662: blk.73.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 663: blk.73.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 664: blk.73.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 665: blk.73.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 666: blk.73.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 667: blk.73.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 668: blk.73.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 669: blk.74.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 670: blk.74.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 671: blk.74.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 672: blk.74.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 673: blk.74.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 674: blk.74.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 675: blk.74.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 676: blk.74.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 677: blk.74.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 678: blk.75.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 679: blk.75.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 680: blk.75.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 681: blk.75.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 682: blk.75.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 683: blk.75.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 684: blk.75.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 685: blk.75.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 686: blk.75.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 687: blk.76.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 688: blk.76.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 689: blk.76.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 690: blk.76.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 691: blk.76.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 692: blk.76.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 693: blk.76.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 694: blk.76.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 695: blk.76.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 696: blk.77.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 697: blk.77.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 698: blk.77.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 699: blk.77.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 700: blk.77.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 701: blk.77.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 702: blk.77.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 703: blk.77.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 704: blk.77.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 705: blk.78.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 706: blk.78.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 707: blk.78.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 708: blk.78.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 709: blk.78.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 710: blk.78.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 711: blk.78.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 712: blk.78.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 713: blk.78.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 714: blk.79.attn_q.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 715: blk.79.attn_k.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 716: blk.79.attn_v.weight q8_0 [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 717: blk.79.attn_output.weight q8_0 [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 718: blk.79.ffn_gate.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 719: blk.79.ffn_down.weight q8_0 [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 720: blk.79.ffn_up.weight q8_0 [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 721: blk.79.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 722: blk.79.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
llama_model_loader: - kv 13: tokenizer.ggml.scores arr
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - kv 15: general.quantization_version u32
llama_model_loader: - type f32: 161 tensors
llama_model_loader: - type q8_0: 562 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = mostly Q8_0
llm_load_print_meta: model params = 68.98 B
llm_load_print_meta: model size = 68.26 GiB (8.50 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.24 MB
llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA L4) as main device
llm_load_tensors: mem required = 69896.52 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/83 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
....................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 160.00 MB
llama_new_context_with_model: compute buffer total size = 151.13 MB
llama_new_context_with_model: VRAM scratch buffer: 145.00 MB
llama_new_context_with_model: total VRAM used: 145.00 MB (model: 0.00 MB, context: 145.00 MB)
main: init model
print_params: n_vocab: 32000
print_params: n_ctx: 32
print_params: n_embd: 8192
print_params: n_ff: 28672
print_params: n_head: 64
print_params: n_head_kv: 8
print_params: n_layer: 80
print_params: norm_rms_eps : 0.000010
print_params: rope_freq_base : 10000.000000
print_params: rope_freq_scale : 1.000000
print_lora_params: n_rank_attention_norm : 1
print_lora_params: n_rank_wq : 4
print_lora_params: n_rank_wk : 4
print_lora_params: n_rank_wv : 4
print_lora_params: n_rank_wo : 4
print_lora_params: n_rank_ffn_norm : 1
print_lora_params: n_rank_w1 : 4
print_lora_params: n_rank_w2 : 4
print_lora_params: n_rank_w3 : 4
print_lora_params: n_rank_tok_embeddings : 4
print_lora_params: n_rank_norm : 1
print_lora_params: n_rank_output : 4
main: total train_iterations 0
main: seen train_samples 0
main: seen train_tokens 0
main: completed train_epochs 0
main: lora_size = 428339424 bytes (408.5 MB)
main: opt_size = 640969696 bytes (611.3 MB)
main: opt iter 0
main: input_size = 8192288 bytes (7.8 MB)
main: compute_size = 278948348096 bytes (266025.9 MB)
main: evaluation order = RIGHT_TO_LEFT
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

@AndrewGodfrey
Copy link
Contributor

I'm guessing the call stack where it asserts is init_lora > alloc_lora > ggml_allocr_alloc.
Could you maybe confirm that, e.g. in a debugger (breakpoint on abort()) or by adding logging?

If so, it could mean the "// measure data size" code in init_lora is coming up with a value for 'size' that is too low.

Copy link
Contributor

github-actions bot commented Apr 4, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants