GPU Acceleration from low-level API? #560

Meron0th · 2023-08-02T21:52:51Z

Meron0th
Aug 2, 2023

Edit: Looks like I can get the control I need out of the High-level wrapper, so this is not a pressing issue. But I would still like to use the low-level API, so an answer would be helpful.

First off, thanks to all the contributors, this library has made querying local LLMs for small projects a breeze.

I've been having a lot of fun with this, and recently I'm trying to use the low-level API. It works well but I would like to speed up the generation by offloading the model to GPU, just like I have from high-level. Unfortunately, I can't see any way from the provided low level api examples.

I took a look at llama_cpp.py reference, which includes a n_gpu_layers argument in the llama_context_params structure, but I cannot figure out how to actually pass that value; I can't see anywhere it is actually modified. I'm not familiar with C++ so I might be missing something obvious.

Additionally, are there any better other resources I should be referencing for the low-level API?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Acceleration from low-level API? #560

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

GPU Acceleration from low-level API? #560

Meron0th Aug 2, 2023

Replies: 0 comments

Meron0th
Aug 2, 2023