Replies: 1 comment
-
Not sure you're still interested in the issue, but LlamaDiskCache does the job, even if it has a slight bug now. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is it possible to preprocess a set of prompts, save them to disk, and then reload them during inference to avoid the need to regenerate cached prompts every time? For instance, if I have a 2000-token prompt that I use daily in a memory-intensive Python program, is there a way to pre-process and save it to avoid the delay of ingesting the prompt each time I start the program? What are the options in this scenario?
Beta Was this translation helpful? Give feedback.
All reactions