[Bug]: Short prompts -> !!!!!!! output from Qwen2.5-32B-Instruct-GPTQ-Int4 w/ROCm #14715
Open
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
Using this exact model: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
Generation with ROCm generates a string of neverending
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
with short prompts. If the prompt is longer (e.g. tools are provided, or the user simply asks a longer question) then generation is fine.The same model works fine on CUDA.
Command:
Successful generation:
Result:
To demonstrate the bug, turn on streaming so you can see it happening:
Result:
Startup messages for CUDA
Startup messages for ROCm
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: