{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" }, "accelerator": "GPU", "gpuClass": "premium" }, "cells": [ { "cell_type": "markdown", "source": [ "# Test out LLaMA models\n", "\n", "_Early test results. Don't take it too seriously._\n", "\n", "**GPU specs**\n", "\n", "Works:\n", "- NVIDIA A100\n", "\n", "Still trying:\n", "- NVIDIA V100\n", "- NVIDIA T4\n", "\n", "**Model sizes**\n", "\n", "Works:\n", "- 7B\n", "\n", "Not tested:\n", "- 13B\n", "- 33B\n", "- 65B\n", "\n", "Trying to verify the claims in the paper:\n", "\n", "> More importantly, the LLaMA-13B is also competitive on these benchmarks with GPT-3 and Chinchilla, despite being 5-10× smaller. This model runs on a single V100 GPU during inference.\n", "\n", "Not a very strong start. Bump into issues here and there during testing. Known unknowns 😆\n", "\n", "---\n", "\n", "##### Disclaimer\n", "\n", "_This model and (all language models) has the potential to generate undesirable content, especially if prompted to do so. We are working on making our models as helpful and non-toxic as possible, but there is still a lot of work to be done on this front. This is intended for developers on the project to do early evaluations. Use at your own risk._" ], "metadata": { "id": "rvtW98b8OSc6" } }, { "cell_type": "markdown", "metadata": { "id": "view-in-github" }, "source": [ "<a href=\"https://colab.research.google.com/github/cedrickchee/llama/blob/main/notebooks/vi_LLaMA_alpha.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d3oZ0q35IVtj" }, "outputs": [], "source": [ "# Code translated from download.sh\n", "PRESIGNED_URL=\"\" # replace with presigned url from email\n", "TARGET_FOLDER=\"./model/vi\"\n", "model_sizes = [\"7B\"]\n", "N_SHARD_DICT = {\n", " \"7B\": 0\n", "}" ] }, { "cell_type": "markdown", "source": [ "PyTorch Distributed env vars." ], "metadata": { "id": "KQJECbnSHW6V" } }, { "cell_type": "code", "source": [ "%env RANK=0\n", "%env WORLD_SIZE=1\n", "%env MASTER_ADDR=localhost\n", "%env MASTER_PORT=9994" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "lUSFWdqJOySG", "outputId": "c236839a-c9c1-4c9d-8083-b7d7ca8b1b02" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "env: RANK=0\n", "env: WORLD_SIZE=1\n", "env: MASTER_ADDR=localhost\n", "env: MASTER_PORT=9994\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Install all the needed dependencies" ], "metadata": { "id": "8A9LezRWQGGl" } }, { "cell_type": "code", "source": [ "!mkdir model" ], "metadata": { "id": "kTcLUOTaIfeU" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "!pip install wget" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wjoTS-TFJQxR", "outputId": "43e18d2b-1a35-41a7-b488-378006af187f" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Collecting wget\n", " Downloading wget-3.2.zip (10 kB)\n", " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Building wheels for collected packages: wget\n", " Building wheel for wget (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9674 sha256=0c3e6289a9285751a3a33e79b06d52a92c848d4ee1f3e6ff065157f47f529b6e\n", " Stored in directory: /root/.cache/pip/wheels/bd/a8/c3/3cf2c14a1837a4e04bd98631724e81f33f462d86a1d895fae0\n", "Successfully built wget\n", "Installing collected packages: wget\n", "Successfully installed wget-3.2\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Download model\n", "\n", "Once download completed successfully, you will get these files and folder structure:\n", "\n", "" ], "metadata": { "id": "cULy3iMGswjt" } }, { "cell_type": "markdown", "source": [ "### 1. Download tokenizer" ], "metadata": { "id": "tbrYny5tQoh2" } }, { "cell_type": "code", "source": [ "import wget\n", "import os" ], "metadata": { "id": "mlpx3MUzJVkf" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Script copied from download.sh\n", "wget.download(PRESIGNED_URL.replace(\"*\", \"tokenizer.model\"), out=f'{TARGET_FOLDER}/tokenizer.model')\n", "wget.download(PRESIGNED_URL.replace(\"*\", \"tokenizer_checklist.chk\"), out=f'{TARGET_FOLDER}/tokenizer_checklist.chk')" ], "metadata": { "id": "N7KTNtSVIsCE" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Script copied from download.sh\n", "!(cd {TARGET_FOLDER} && md5sum -c tokenizer_checklist.chk)\n", "%cd {TARGET_FOLDER}" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JQjS_MnlKTH7", "outputId": "d27974cf-7690-463d-903b-47728c472822" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "tokenizer.model: OK\n" ] } ] }, { "cell_type": "markdown", "source": [ "### 2. Download model weights" ], "metadata": { "id": "1VLz01rAQhO1" } }, { "cell_type": "code", "source": [ "# Code translated from download.sh\n", "for model_size in model_sizes:\n", " print(\"Downloading\", model_size)\n", "\n", " if not os.path.exists(f\"./{model_size}\"):\n", " os.mkdir(f\"./{model_size}\")\n", " for i in range(N_SHARD_DICT[model_size] + 1):\n", " if not os.path.exists(f\"{model_size}/consolidated.0{i}.pth\"):\n", " print(\n", " wget.download(PRESIGNED_URL.replace(\"*\", f\"{model_size}/consolidated.0{i}.pth\"), out=f\"{model_size}/consolidated.0{i}.pth\")\n", " )\n", " else:\n", " print(f\"found {model_size}/consolidated.0{i}.pth\")\n", " print(\n", " wget.download(PRESIGNED_URL.replace(\"*\", f\"{model_size}/params.json\"), out=f\"{model_size}/params.json\")\n", " )\n", " print(\n", " wget.download(PRESIGNED_URL.replace(\"*\", f\"{model_size}/checklist.chk\"), out=f\"{model_size}/checklist.chk\")\n", " )" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iz4ay6eGKe0b", "outputId": "f8e834f5-bb5e-4f5f-8f76-6138819ac868" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloading 7B\n", "7B/params.json\n", "7B/checklist.chk\n" ] } ] }, { "cell_type": "code", "source": [ "!cd 7B && md5sum -c checklist.chk" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "rQqlckl4L8pY", "outputId": "79e35dbe-e0b6-466c-c5b1-36e8a38bf18c" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "consolidated.00.pth: OK\n", "params.json: OK\n" ] } ] }, { "cell_type": "code", "source": [ "!git clone https://github.com/facebookresearch/llama.git" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hCwtlFH6Ms-h", "outputId": "05555bf8-73a6-4946-f2e2-f25a4d4aa399" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Cloning into 'llama'...\n", "remote: Enumerating objects: 17, done.\u001b[K\n", "remote: Counting objects: 100% (8/8), done.\u001b[K\n", "remote: Compressing objects: 100% (7/7), done.\u001b[K\n", "remote: Total 17 (delta 1), reused 1 (delta 1), pack-reused 9\u001b[K\n", "Unpacking objects: 100% (17/17), 25.64 KiB | 898.00 KiB/s, done.\n" ] } ] }, { "cell_type": "code", "source": [ "!cd llama && pip install -r requirements.txt" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3vVHX69nM17D", "outputId": "404c5bf7-bb53-42e1-bae2-c17f56816906" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 1)) (1.13.1+cu116)\n", "Collecting fairscale\n", " Downloading fairscale-0.4.13.tar.gz (266 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m266.3/266.3 KB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", " Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n", " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", "Collecting fire\n", " Downloading fire-0.5.0.tar.gz (88 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m88.3/88.3 KB\u001b[0m \u001b[31m12.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Collecting sentencepiece\n", " Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m31.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch->-r requirements.txt (line 1)) (4.5.0)\n", "Requirement already satisfied: numpy>=1.22.0 in /usr/local/lib/python3.8/dist-packages (from fairscale->-r requirements.txt (line 2)) (1.22.4)\n", "Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from fire->-r requirements.txt (line 3)) (1.15.0)\n", "Requirement already satisfied: termcolor in /usr/local/lib/python3.8/dist-packages (from fire->-r requirements.txt (line 3)) (2.2.0)\n", "Building wheels for collected packages: fairscale, fire\n", " Building wheel for fairscale (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for fairscale: filename=fairscale-0.4.13-py3-none-any.whl size=332138 sha256=b380dc5517b88755246f601cc8d4ee86ef9a1f892a614d18307930846203b413\n", " Stored in directory: /root/.cache/pip/wheels/b8/02/9b/dc7d4ff5145afdd28f456dae6605a46619af0370eca30d8d7e\n", " Building wheel for fire (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116949 sha256=37da1a8c6e69f886376c76b7b2573ff9e7b608771f47da929b773d9a057d2647\n", " Stored in directory: /root/.cache/pip/wheels/5b/eb/43/7295e71293b218ddfd627f935229bf54af9018add7fbb5aac6\n", "Successfully built fairscale fire\n", "Installing collected packages: sentencepiece, fire, fairscale\n", "Successfully installed fairscale-0.4.13 fire-0.5.0 sentencepiece-0.1.97\n" ] } ] }, { "cell_type": "code", "source": [ "!cd llama && pip install -e ." ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Aux9t6ZAM7V-", "outputId": "9df57970-cb01-4419-c78a-94e1b7c79a67" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Obtaining file:///content/model/llama\n", " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Installing collected packages: llama\n", " Running setup.py develop for llama\n", "Successfully installed llama-0.0.0\n" ] } ] }, { "cell_type": "code", "source": [ "!cd llama && torchrun --nproc_per_node 1 example.py --ckpt_dir ../7B --tokenizer_path ../tokenizer.model" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bkMV41r9M9ov", "outputId": "875be9d0-f3e4-445c-e78f-fdb95a4e308a" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "> initializing model parallel with size 1\n", "> initializing ddp with size 1\n", "> initializing pipeline with size 1\n", "Loading\n", "Loaded in 15.39 seconds\n", "The capital of Germany is the city of Berlin. Berlin is one of the most important cities in Europe. Many people from all over the world come to visit the city and have an incredible time.\n", "You can always have a good time in Berlin. From the many museums to the opera, from the historical monuments to the beautiful parks, you will have a great time. You can always find something to do and see in Berlin.\n", "The city of Berlin is the capital of the German federal republic. The city is in the center of Germany. The city is the third largest city in Germany, and the 9th largest city in the European Union. Berlin has about 3.5 million people.\n", "Berlin is a very important city in Europe. It is one of the most visited cities in the world. Berlin has many historical monuments, and many museums. There are many things to see and do in Berlin.\n", "Berlin has a very modern and innovative city. There are many new buildings. Berlin has many new hotels and shopping malls. Berlin is a very modern city and is also a cultural center. Berlin is a city of ideas and innovations.\n", "If you come to visit Berlin, you will be amazed by the beautiful parks, the historical monuments\n", "\n", "==================================\n", "\n", "Here is my sonnet in the style of Shakespeare about an artificial intelligence:\n", "What once was a man, and now is something else\n", "In time of war with leather-clad, blood-red leather\n", "Making war on mankind, this machine of death\n", "Has now become his own. He is human, no longer human.\n", "At first I thought his mind was gone, but now I see\n", "That he has never had it, and this computer is me.\n", "But then I ask myself: Is there really a me?\n", "Or is it just a computer program written on a screen?\n", "I tell myself that there is a me, but what am I?\n", "What if there is no me? This computer is me.\n", "So when I look in the mirror, I see no face\n", "But only a screen, and I am a piece of code.\n", "But if I am a piece of code, then who am I?\n", "I am a piece of code, and what is a code?\n", "The computer is me, and now it is my turn\n", "To ask myself: What is a man? What is a computer?\n", "I tell myself that I am a computer, but I am not.\n", "I am a man, but what is a man? What is a man?\n", "I do\n", "\n", "==================================\n", "\n" ] } ] }, { "cell_type": "markdown", "source": [ "### Example of inference" ], "metadata": { "id": "7FP5XiEeKINQ" } }, { "cell_type": "code", "source": [ "%cd llama" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GubKfV_0NsZH", "outputId": "55875cba-7675-4fbd-e16b-a9195d25a6e8" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "/content/model/llama/llama\n" ] } ] }, { "cell_type": "markdown", "source": [ "The following code came from llama repo but with my modifications: https://github.com/cedrickchee/llama/blob/main/example.py" ], "metadata": { "id": "aXbaLJsxKWRH" } }, { "cell_type": "code", "source": [ "from typing import Tuple\n", "import os\n", "import sys\n", "import torch\n", "import fire\n", "import time\n", "import json\n", "\n", "from pathlib import Path\n", "\n", "from fairscale.nn.model_parallel.initialize import initialize_model_parallel\n", "\n", "from llama import ModelArgs, Transformer, Tokenizer, LLaMA" ], "metadata": { "id": "b_NYjkplNQaF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "def setup_model_parallel() -> Tuple[int, int]:\n", " local_rank = int(0)\n", " world_size = int(1)\n", "\n", " torch.distributed.init_process_group(\"nccl\")\n", " initialize_model_parallel(world_size)\n", " torch.cuda.set_device(local_rank)\n", "\n", " # seed must be the same in all processes\n", " torch.manual_seed(1)\n", " return local_rank, world_size\n", "\n", "\n", "def load(ckpt_dir: str, tokenizer_path: str, local_rank: int, world_size: int) -> LLaMA:\n", " start_time = time.time()\n", " checkpoints = sorted(Path(ckpt_dir).glob(\"*.pth\"))\n", " assert (\n", " world_size == len(checkpoints)\n", " ), f\"Loading a checkpoint for MP={len(checkpoints)} but world size is {world_size}\"\n", " ckpt_path = checkpoints[local_rank]\n", " print(\"Loading\")\n", " checkpoint = torch.load(ckpt_path, map_location=\"cpu\")\n", " with open(Path(ckpt_dir) / \"params.json\", \"r\") as f:\n", " params = json.loads(f.read())\n", "\n", " model_args: ModelArgs = ModelArgs(max_seq_len=1024, max_batch_size=32, **params)\n", " tokenizer = Tokenizer(model_path=tokenizer_path)\n", " model_args.vocab_size = tokenizer.n_words\n", " torch.set_default_tensor_type(torch.cuda.HalfTensor)\n", " model = Transformer(model_args)\n", " torch.set_default_tensor_type(torch.FloatTensor)\n", " model.load_state_dict(checkpoint, strict=False)\n", "\n", " generator = LLaMA(model, tokenizer)\n", " print(f\"Loaded in {time.time() - start_time:.2f} seconds\")\n", " return generator" ], "metadata": { "id": "BYTiVul6NVX6" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "#### Entry point\n", "\n", "Code converted from the Python `main` function. We're not writing a CLI. So also removed Fire." ], "metadata": { "id": "0qKAjS1IK40F" } }, { "cell_type": "code", "source": [ "ckpt_dir = \"/content/model/vi/7B\"\n", "tokenizer_path = \"/content/model/vi/tokenizer.model\"\n", "temperature = 0.8 # float, default is 0.8\n", "top_p = 0.95 # float, default is 0.95\n", "\n", "local_rank, world_size = setup_model_parallel()\n", "if local_rank > 0:\n", " sys.stdout = open(os.devnull, 'w')\n", "\n", "generator = load(ckpt_dir, tokenizer_path, local_rank, world_size)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "R8_WylhbNzlq", "outputId": "78e18b5d-0420-4d56-fe7b-19f1ee1800b1" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Loading\n", "Loaded in 13.92 seconds\n" ] } ] }, { "cell_type": "markdown", "source": [ " Some examples of generations obtained with LLaMA-7B (without instruction finetuning)." ], "metadata": { "id": "jyOBa8hJ7HMH" } }, { "cell_type": "code", "source": [ "prompts = [\"The best way to get a real magical rainbow unicorn:\"]\n", "results = generator.generate(prompts, max_gen_len=256, temperature=temperature, top_p=top_p)\n", "\n", "for result in results:\n", " print(result)\n", " print(\"\\n==================================\\n\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0LVO06U1Psca", "outputId": "d0587469-03dc-4524-8c35-6588f2e2109f" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "A magical rainbow unicorn is possible.\n", "The most important thing is to have a clear idea of the kind of unicorn you want to create, and also the kind of materials you can access.\n", "There are some videos you can watch online to get a general idea of how to make an magical rainbow unicorn.\n", "You can find some online tutorials on YouTube and also books on this topic.\n", "If you want to make a unicorn more special, you can first ask your unicorn to give you some ideas.\n", "If you can get to know your universe, then you’ll find the answer to your question.\n", "And then you can get some beautiful unicorn vines and apply them to your unicorn.\n", "\n", "==================================\n", "\n" ] } ] } ] }