-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I try to do finetuning I get a GGML_ASSERT: ggml.c:16911: np < GGML_MAX_PARAMS error. #4342
Comments
I will post the log in parts due to character limit. Failure Logs
|
I am suffering the same situation with the same device + 70B quantized model. |
Increasing |
@slaren what does |
From what I can tell, it is the maximum number of trainable tensors in a graph, but I don't know a lot about the training code. @ggerganov and @xaedes would know more about this. |
Yep, it is the maximum number of trainable tensors. Since there are two trainable lora tensors for each matrix required is roughly twice the number of base model tensors. According to the log you posted there are 723 tensors, so you probably need at least 1446 for GGML_MAX_PARAMS. |
For now, I set it to 10240, which is 10 times higher, and compiled it. |
It's used in a local, so it may cause a stack overflow if it is too big. |
So, why not try setting it to 2048 (2^11)? |
I've submitted a pull request. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
finetuning
Llama 2 70B should succeed.Current Behavior
finetuning
Llama 2 70B fails withEnvironment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
$ uname -a Darwin xxxxxxxxx-MacStudio 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020 arm64
Failure Information (for bugs)
Please help provide information about the failure / bug.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Doing so will result in an error.
It worked fine in 7b model, so it seems to be an error that occurs as the scale increases.
What does this limit mean?
What happens if you increase the number?
I thought this problem was related, but it seems like it's a different error.
Thank you!
The text was updated successfully, but these errors were encountered: