-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User] Unable to Finetune Llama 2 70B #3644
Comments
How much memory do you have? |
@RedAndr Sorry I completely forgot to add this in. I'm editing this into the original issue as well
|
I am getting the similar error, on mac mini M2, 24GB or memory. Base model: openllama-3b-v2
|
This doesn't directly help you, but the error isn't related to how much memory you have available. I believe it means that the maximum context size didn't get calculated correctly. So not a user error. |
Thanks for the response @KerfuffleV2 So I believe there's some bug to be fixed, but curious if it impacts everyone trying to use finetune or just some combination of input? |
Hey folks! On 6961c4b, I'm getting a new error:
|
This issue comes when I have --batch 1, any other value works fine. |
@QueryType So you got it working? If so, what specific arguments did you use? |
Yes, but it is too slow and fires up the CPU to 100 deg. C! Also swap got activated. |
Nice! Thanks! I myself do not have an issue with lower parameter models like 13B, but 70B just doesn't want to start at all with the error:
|
You can probably workaround that problem by increasing |
I actually tried that previously -- increasing it to Log/srv/shared/llama.cpp/finetune --model-base /srv/shared/llama.cpp/models/llama-2/llama-2-70b/ggml-model-q8_0.gguf --checkpoint-in llama-2-70b-finetune-data-LATEST.gguf --checkpoint-out llama-2-70b-finetune-data-ITERATION.gguf --lora-out llama-2-70b-finetune-data-ITERATION.bin --train-data /srv/shared/data.txt --save-every 50 --threads 48 --batch 2 --grad-acc 2 --adam-alpha 0.0003 --ctx 32 --sample-start '### Instruction:' --include-sample-start --no-checkpointing |
I'm guessing the call stack where it asserts is init_lora > alloc_lora > ggml_allocr_alloc. If so, it could mean the "// measure data size" code in init_lora is coming up with a value for 'size' that is too low. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
fineunte
ing Llama 2 70B should succeedCurrent Behavior
fineunte
ing Llama 2 70B fails withI should add that finetuning Llama 2 13B works.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
System Memory
Physical (or virtual) hardware you are using, e.g. for Linux:
Operating System, e.g. for Linux:
SDK version, e.g. for Linux:
Failure Information (for bugs)
Steps to Reproduce
The following steps assume that:
./models/llama-2-70b
.cd
into the directory where llama.cpp was cloned.make
Failure Logs
The logs are too long to include as a comment. Instead, I am attaching them here. You'll also find that I ran a finetune on Llama 13B just to demonstrate that it's working.
error.log
The text was updated successfully, but these errors were encountered: