-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Investigate performance discrepancy with llama-rs - 1.5x-2x slower #932
Comments
I think I made a regression in c3ac702 Can you check that reverting it solves the issue? |
I checked out 9d634ef. There was no improvement, and in fact, a regression. |
Oops, I found the issue: using hyperthreading (#34 (comment)): Using 8 threads instead of 16:
|
I think we should make the default number of real CPU cores. @ggerganov |
Pretty sure the default version of the code uses like 4. Or at least the initial examples. |
On linux, the default is number of logical threads: |
Oh damn. That's why people are complaining when they use all their threads
😂 I guess that's the one bonus Windows has in this case.
…On Thu, Apr 13, 2023, 04:47 jon-chuang ***@***.***> wrote:
Pretty sure the default version of the code uses like 4. Or at least the
initial examples.
On linux, the default is number of logical threads:
https://github.com/ggerganov/llama.cpp/blob/e7f6997f897a18b6372a6460e25c5f89e1469f1d/examples/common.cpp#L35
—
Reply to this email directly, view it on GitHub
<#932 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AYMC3AAWX4UBNPI22KGQQZDXA64TJANCNFSM6AAAAAAW4OYSPU>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
Preliminary results show that
llama.cpp
is 1.5x-2x slower thanllama-rs
. They were both checked to compile with the same arch flags and use the same gnu toolchain.Summary (on
Vicuna 13B, 2048 ctx size, 256 predict tokens
):llama.cpp
: 430.44 ms per runllama-rs
: per_token_duration: 272.793msDetailed results
An interesting observation is that CPU util is lower on llama-rs.
System Info:
llama.cpp
llama-rs
No BLAS.
Notes: llama-rs bench runs on my branch.
The text was updated successfully, but these errors were encountered: