Google Colab crashing #699
Unanswered
developer234
asked this question in
Q&A
Replies: 1 comment
-
Hey long shot but did you find a solution? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello All,
I am using llama-cpp-python for inference of TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q4_K_M.gguf model on GPU.
My code was working completely but suddenly my colab is crashing. I have used shorter context length as well, but it is not working.
I have used these command for pip install:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.84 --force-reinstall --upgrade --no-cache-dir --verbose
!pip install -q huggingface_hub
I have following code for model inference:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename= "llama-2-13b-chat.Q4_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
lcpp_llm = Llama(
model_path=model_path,
n_threads=2, # CPU cores
n_batch=2048, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_gpu_layers=50, # Change this value based on your model and your GPU VRAM pool.
n_ctx=2048, # Context window
)
Beta Was this translation helpful? Give feedback.
All reactions