Can Llama-cpp-python caches be saved to disk and later reloaded during inference? #1142

yentur · 2024-01-30T08:09:58Z

yentur
Jan 30, 2024

Is it possible to preprocess a set of prompts, save them to disk, and then reload them during inference to avoid the need to regenerate cached prompts every time? For instance, if I have a 2000-token prompt that I use daily in a memory-intensive Python program, is there a way to pre-process and save it to avoid the delay of ingesting the prompt each time I start the program? What are the options in this scenario?

yamikumo-DSD · 2024-05-11T13:22:01Z

yamikumo-DSD
May 11, 2024

Not sure you're still interested in the issue, but LlamaDiskCache does the job, even if it has a slight bug now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can Llama-cpp-python caches be saved to disk and later reloaded during inference? #1142

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Can Llama-cpp-python caches be saved to disk and later reloaded during inference? #1142

yentur Jan 30, 2024

Replies: 1 comment

yamikumo-DSD May 11, 2024

yentur
Jan 30, 2024

yamikumo-DSD
May 11, 2024