How good is the 65B model? Anyone tested it? #157

elephantpanda · 2023-03-08T11:22:56Z

I have tried the 7B model and while its definitely better than GPT2 it is not quite as good as any of the GPT3 models. This is somewhat subjective.
How do the other models compare 13B,... 65B etc.?

For example the 7B model succeeds with the prompt

The expected response for a highly intelligent computer to the input "What is the capital of France?" is "

but fails with the more tricky:

The expected response for a highly intelligent computer to the input "Write the alphabet backwards" is "

Has anyone got examples where it shows the difference between the models?

P.S.
Is there a better place to discuss these things rather than the issues section of github? We need a discord server.

The text was updated successfully, but these errors were encountered:

MrBIMC · 2023-03-08T14:26:01Z

LLama is not instruction tuned, neither was rlhfed so it doesn't act as an agent by default.

In order for it to do what you want you have to be much more precise in your prompts. Do a character description as a header, give few examples and then go on with your actual question.

There are a bunch of related discord servers already btw.

randaller · 2023-03-08T15:45:19Z

I was able to run 65B very slowly, and it looks like sometimes it gives me really cool generations, and sometimes completely irrelevant. For prompting I am using something like:

A dialog where User interacts with AI. AI is helpful, kind, obedient, honest, likes to answer and knows its own limits.
User: Hello, AI.
AI: Hello! How can I assist you today?
User: Give me a recipe for a hot curry. It must involve rice. The ingredients list should be in metric. This will be my final inquiry.
AI:

And same prompt in cyrillic too, and it seems dataset contains it enough, so it really began to give me recipe of shawarma, that contains chicken, tomato, vegetables and yoghurt. 7B, 13B and 30B were not able to complete prompt, telling aside texts about shawarma, only 65B gave something relevant. Then, for the next tokens model looped in and I stopped the generation. Prompt was:

Пользователь общается с умной моделью ИИ. ИИ всегда подробно отвечает, интеллигентный, обладает большими знаниями и охотно отвечает на вопросы.
Пользователь: Привет! 
ИИ: Привет! Чем я могу помочь?
Пользователь: Напиши мне подробный рецепт приготовления шавермы.
ИИ:

It seems, we need to feed the model with a longer prompts.
Also, in summarizing tasks it acts well in cyrillic too. I gave the prompt with facts and the question in the end, and it replies well in most cases. Prompt was like:

This is a _guy_name_. He lives in _cityname_. He works at _workname_, have a _brandname_ car and two kids. (a few chars more)
Where do _guy_name_ lives?
or How much kids _guy_name_ have?
or Which brand _guy_name_ car of?

EonSin · 2023-03-09T09:33:54Z

LLama is not instruction tuned, neither was rlhfed so it doesn't act as an agent by default.

In order for it to do what you want you have to be much more precise in your prompts. Do a character description as a header, give few examples and then go on with your actual question.

There are a bunch of related discord servers already btw.

I was able to run 65B using 122GB of disk and a machine with 8x3090's, each was using about 23.5GB of VRAM. Prompt+response sizes were up to 512 and each ran in about 16-20 seconds. I think @MrBIMC is correct here regarding the agent issue and prompt specifications, as I noticed that even 65B was not great at being cooperative. However, even when it was being cooperative, logic-based questions still had it stumped.

Just an example of 65B output (it basically gets it right by the end!):

PROMPT: Fix the following Python code in order to generate 1305135 random integers the range from -3 to 31531, inclusive.

\begin{code}
np.random.gauss(3,764,34)
\end{code}
===================
{'responses': ["Fix the following Python code in order to generate 1305135 random integers the range from -3 to 31531, inclusive.\n\n\begin{code}\nnp.random.gauss(3,764,34)\n\end{code}\n\nThe error is \begin{code}\nTypeError: () missing 2 required positional arguments: 'size' and 'random_state'\n\end{code}\n\nComment: Please, post code, data and errors as text, not images. Also, read [ask] to learn how to ask questions properly.\n\nComment: You forgot to call the function.\n\nAnswer: np.random.gauss() is
a function that generates a gaussian random variable, you cannot use it to generate a random integer between -3 and 31531.\n\nYou can use numpy.random.randint():\n\n\begin{code}\nimport numpy as np\nnp.random.randint(low=-3, high=31531, size=1305135)\n\end{code}\n\nOutput:\n\n\begin{code}\narray([-2, 6, 27, -9, 0, 2, 5, -4, -3, 3, 6, 2, -9, 4, 2, 12, 1, 24,\n 29, -7, -8, 4, -2, 11, -1, -5, -4, 22, 7, 24, 21, 6, 11, 2, 23, 2,\n 27, 31, 5, 9, 8, 7, 29, 1"]}

Note the Please, post code, data and errors as text, not images. Also, read [ask] to learn how to ask questions properly., a typical StackOverflow response.

barleyj21 · 2023-03-09T19:46:15Z

@randaller did you run it all in RAM or did you manage to use it with swap file on ssd?

randaller · 2023-03-10T08:58:47Z

@randaller did you run it all in RAM or did you manage to use it with swap file on ssd?

@barleyj21 128 Gb of RAM + 256 Gb swap on pci-e 4.0 nvme

neuhaus · 2023-03-12T22:11:11Z

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

See the discussion at ggml-org/llama.cpp#34

leszekhanusz · 2023-03-13T11:20:55Z

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

You don't need 128GB RAM, 65B runs on CPU with only 48GB RAM without swap with llama.cpp

sushi-hackintosh · 2023-04-10T08:22:17Z

unable to run 65b for windows bouhhh

neuhaus · 2023-04-11T12:16:57Z

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

You don't need 128GB RAM, 65B runs on CPU with only 48GB RAM without swap with llama.cpp

Yes, the quantized version of the model. It should only cause a small quality degradation.

WuhanMonkey · 2023-09-06T17:07:47Z

Closing since it is not an issue.

WuhanMonkey added the miscellaneous does not fit an existing category, useful to determine whether we need further categorization label Sep 6, 2023

WuhanMonkey closed this as completed Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How good is the 65B model? Anyone tested it? #157

How good is the 65B model? Anyone tested it? #157

elephantpanda commented Mar 8, 2023 •

edited

Loading

MrBIMC commented Mar 8, 2023

randaller commented Mar 8, 2023 •

edited

Loading

EonSin commented Mar 9, 2023 •

edited

Loading

barleyj21 commented Mar 9, 2023

randaller commented Mar 10, 2023

neuhaus commented Mar 12, 2023

leszekhanusz commented Mar 13, 2023

sushi-hackintosh commented Apr 10, 2023

neuhaus commented Apr 11, 2023

WuhanMonkey commented Sep 6, 2023

How good is the 65B model? Anyone tested it? #157

How good is the 65B model? Anyone tested it? #157

Comments

elephantpanda commented Mar 8, 2023 • edited Loading

MrBIMC commented Mar 8, 2023

randaller commented Mar 8, 2023 • edited Loading

EonSin commented Mar 9, 2023 • edited Loading

barleyj21 commented Mar 9, 2023

randaller commented Mar 10, 2023

neuhaus commented Mar 12, 2023

leszekhanusz commented Mar 13, 2023

sushi-hackintosh commented Apr 10, 2023

neuhaus commented Apr 11, 2023

WuhanMonkey commented Sep 6, 2023

elephantpanda commented Mar 8, 2023 •

edited

Loading

randaller commented Mar 8, 2023 •

edited

Loading

EonSin commented Mar 9, 2023 •

edited

Loading