-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Run several single thread operators parellel #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In my testing, this will give noticable difference when running in high # of threads: Before
After
|
} else { | ||
int start = i; | ||
int end = i + 1; | ||
while (end < cgraph->n_nodes && dispath_threads < n_threads && (end - start) < 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic number 4 needs some tuning.
Eval time:
|
18 threads on a how many C/T machine? |
10cores 20threads win10 box |
Getting segfault
|
To support this properly would require deeper changes, at least:
|
It should behave like llama.cpp, where most out of the box usages treat special characters accordingly
* Add low-level batching notebook * fix: tokenization of special characters: (ggml-org#850) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly * Update CHANGELOG * Cleanup * Fix runner label * Update notebook * Use llama_decode and batch api * Support logits_all parameter --------- Co-authored-by: Antoine Lizee <[email protected]>
In 18 threads testing, it shows about 5% performance gain.