Google Colab crashing #699

developer234 · 2023-09-12T11:52:28Z

developer234
Sep 12, 2023

Hello All,

I am using llama-cpp-python for inference of TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q4_K_M.gguf model on GPU.
My code was working completely but suddenly my colab is crashing. I have used shorter context length as well, but it is not working.

I have used these command for pip install:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.84 --force-reinstall --upgrade --no-cache-dir --verbose
!pip install -q huggingface_hub

I have following code for model inference:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename= "llama-2-13b-chat.Q4_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
lcpp_llm = Llama(
model_path=model_path,
n_threads=2, # CPU cores
n_batch=2048, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
n_gpu_layers=50, # Change this value based on your model and your GPU VRAM pool.
n_ctx=2048, # Context window
)

  response = lcpp_llm(
    prompt=prompt_template,
    max_tokens=2048,
    temperature=1.0,
    top_p=0.95,
    repeat_penalty=1.2,
    top_k=10,    
    echo=False # return the prompt
                )
              
When it comes to inference colab is crashing.

Please help me to solve this.

Anshul261 · 2024-06-14T15:09:17Z

Anshul261
Jun 14, 2024

Hey long shot but did you find a solution?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Colab crashing #699

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Google Colab crashing #699

developer234 Sep 12, 2023

Replies: 1 comment

Anshul261 Jun 14, 2024

developer234
Sep 12, 2023

Anshul261
Jun 14, 2024