Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GGML] Added RISC-V Vector Intrinsics Support #2929

Merged
merged 2 commits into from
Sep 1, 2023

Conversation

Tameem-10xE
Copy link
Contributor

@Tameem-10xE Tameem-10xE commented Aug 31, 2023

Hi,

In this PR, we have added the RISC-V intrinsics for the following vector dot product functions

     ggml_vec_dot_q4_0_q4_0
     ggml_vec_dot_q4_1_q8_1
     ggml_vec_dot_q5_0_q8_0
     ggml_vec_dot_q5_1_q8_1
     ggml_vec_dot_q8_0_q8_0


In future, this will enable GGML to run efficiently on RISC-V hardware with vector support and also open a way to compare its performance with other vector processors like Intel AVX and Arm Neon. This will also led to performance improvement and speedup for application using GGML on the RISC-V hardware with vector processor

The output is tested and verified for each of the legacy ggml 7B quantize models (includes q4_0, q4_1, q5_0, q5_1 and q8_0) by using qemu-riscv64 emulator.

Edit: LLaMa.cpp has stop using GGML format after 22nd August, 2023, and instead shifted to new format GGUF. So these functions will not be effective for LLaMa.cpp after this, soon will submit new PR with support for GGUF


[Cross Compiling Environment]
Ubuntu: 22.10
riscv-toolchain: 2023.07.05 riscv64 linux glibc

On the actual hardware there will be no issue but running with qemu require to sligthly modify the Makefile, we just have to overwrite the CC and CCX variable with toolchain gcc compiler and also specify the architecture flag for the Makefile

CC := riscv64-unknown-linux-gnu-gcc
CCX := riscv64-unknown-linux-gnu-g++ 

And then after it run make

make   RISCV_CROSS_COMPILE=1  RISCV=1  


[QEMU]

$   qemu-riscv64 -L /path/to/sysroot/  -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./main -m ./path/to/model.gguf -p "Anything" -n 9


[Output]
image


If you'd like to test these changes, we've set up a cloud-v pipeline on our fork repository (main branch), which you can use to run and verify the code on riscv-qemu emulator

Any feedback is welcome, if you have any suggestions or improvements, please share.

Added RVV intrinsics for following
   ggml_vec_dot_q4_0_q8_0
   ggml_vec_dot_q4_1_q8_1
   ggml_vec_dot_q5_0_q8_0
   ggml_vec_dot_q5_1_q8_1
   ggml_vec_dot_q8_0_q8_0

Co-authored-by: Sharafat <[email protected]>
Signed-off-by: Ahmad Tameem <[email protected]>
@ggerganov ggerganov merged commit 5aec2cf into ggml-org:master Sep 1, 2023
@camel-cdr
Copy link
Contributor

camel-cdr commented Sep 3, 2023

The code structure of ggml.c doesn't work very well with a scalable vector architecture. If this course is continued I'd try to detect the optimal vtype at startup and use that instead of hoping that LMUL=1 will work.

Also, temp_1 and temp_2 can be easily synthesised with viota, no need for loads.

@Tameem-10xE
Copy link
Contributor Author

Tameem-10xE commented Sep 3, 2023

Thank you for your feedback @camel-cdr.
Did you tested it with old ggml (i.e ggml_q4_0.bin) weights or new gguf weights (i.e: ggml_q4_k.gguf)?
Actually due to recent changes I misjudged this and optimization will not effect performance for new gguf type weights.
I am currently working on this and writing new functions that also includes these weights
Thanks again!

@SiriEmb
Copy link

SiriEmb commented Feb 25, 2024

Hi does RISCV port support Q2 ?

@Tameem-10xE
Copy link
Contributor Author

Tameem-10xE commented Mar 4, 2024

Hi, for the legacy weights I think there were no Q2 weights (the only weights were Q4, Q5 and Q8 if I am correct)
For the newer GGUF weights yes, it does support the Q2 (you can check in this PR: #3453 )

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants