batch inference #347
Unanswered
liuxiaohao-xn
asked this question in
Q&A
Replies: 2 comments 3 replies
-
Same question here, it seems llama.cpp already supports batched inference. @abetlen It seems this usecase is very popular, what do you think? |
Beta Was this translation helpful? Give feedback.
2 replies
-
What kind of work would need to be done, either here or upstream in llama.cpp, to get batch inference fully working? Is there a roadmap anywhere already? I've got some code here from work that needs my help working against Vicuna, and the only real obstacle left is getting batch processing working |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have multiple prompts, I want to feed them all at once to the model to generate the output,can you tell me how to achieve it ?
Beta Was this translation helpful? Give feedback.
All reactions