-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: MiniCPM-V-2.6 commit d565bb2fd5a2a58b9924a7a34e77a87c78c52137 causing crash in moondream #9066
Comments
I think that was the commit before MiniCPM-V-2.6 got merged. So it might be something else. |
version 3597 works and version 3598 bombs. i narrowed it down. it should be easy enough for someone to reproduce this |
Can confirm it's broken for llava. It seems to work intermittently, probably some out of bounds memory access. |
It crashes in this assert located in GGML get rows operation:
The direct cause is that the index in get rows operation is outside the valid range. I noticed that Note the
And it no longer crashes:
I guess the question remains why it worked before and now it doesn't? I have no idea yet :/ |
The crash was very inconsistent, probably because sometimes this off-by-one access wasn't actually out of bounds memory (maybe due to padding?). It was extra weird because adding a simple |
Edit: nope, I think this does not solve the issue. I am still getting intermittent segfaults |
@LostRuins I just tried release builds, in my case only the debug builds (LLAMA_DEBUG=1) crashed on this assert, release build worked without problems. So this may be an entirely unrelated problem after all. I can't reproduce crashes in release builds. |
I'm not hitting any assert. I am getting a segmentation fault
Adding the abovementioned print statements before every call to |
It finally crashed. I guess the important part is LLAMA_CUDA=1. |
This assert was added fairly recently (in #6210), so previously this wouldn't be noticed even in debug builds. It would cause wrong data to be returned, but since more tensors are allocated in the same buffer, it is not likely to cause it to crash with an invalid access. It looks like a logic error in the clip implementation, and it may have affected the quality of the generation. |
Try this:
I noticed that the default constructor of clip_ctx didn't initialize the fields, so they were basically all filled with garbage:
After the change:
|
@fairydreaming |
@fairydreaming ggml_cuda_init: found 1 CUDA devices: |
@saket424 yeah, I didn't use -ngl option, so it didn't offload any layers. |
@monatis can you take a look at this code: I think it's a rewritten form of your original llava 1.5 code:
Do you remember what is the purpose of i + 1? Is it related to vision feature select strategy? I found the following in transformers library: (note Since |
What happened?
export LLAMA_CUDA=1 # only if for NViDiA CUDA
export CUDA_DOCKER_ARCH=compute_86
make -j$(nproc) NVCC=/usr/local/cuda/bin/nvcc
./llama-llava-cli -m ./m2/moondream2-text-model-f16.gguf --mmproj ./m2/moondream2-mmproj-f16.gguf --image ./assets/demo-2.jpg -p "describe the image" --temp 0.1 -c 2048
core dump
before this commit no crash
Since minicpm2.6 has a completely separate cli, i did not expect it to affect llama-llava-cli which moondream uses
Crash only observed on linux cuda and not on Mac
Name and Version
Yes crash with version 3598
No crash with
./llama-cli --version
version: 3597 (ee2984b)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: