Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How good is the 65B model? Anyone tested it? #157

Closed
elephantpanda opened this issue Mar 8, 2023 · 10 comments
Closed

How good is the 65B model? Anyone tested it? #157

elephantpanda opened this issue Mar 8, 2023 · 10 comments
Labels
miscellaneous does not fit an existing category, useful to determine whether we need further categorization

Comments

@elephantpanda
Copy link

elephantpanda commented Mar 8, 2023

I have tried the 7B model and while its definitely better than GPT2 it is not quite as good as any of the GPT3 models. This is somewhat subjective.
How do the other models compare 13B,... 65B etc.?

For example the 7B model succeeds with the prompt

The expected response for a highly intelligent computer to the input "What is the capital of France?" is "

but fails with the more tricky:

The expected response for a highly intelligent computer to the input "Write the alphabet backwards" is "

Has anyone got examples where it shows the difference between the models?

P.S.
Is there a better place to discuss these things rather than the issues section of github? We need a discord server.

@MrBIMC
Copy link

MrBIMC commented Mar 8, 2023

LLama is not instruction tuned, neither was rlhfed so it doesn't act as an agent by default.

In order for it to do what you want you have to be much more precise in your prompts. Do a character description as a header, give few examples and then go on with your actual question.

There are a bunch of related discord servers already btw.

@randaller
Copy link

randaller commented Mar 8, 2023

I was able to run 65B very slowly, and it looks like sometimes it gives me really cool generations, and sometimes completely irrelevant. For prompting I am using something like:

A dialog where User interacts with AI. AI is helpful, kind, obedient, honest, likes to answer and knows its own limits.
User: Hello, AI.
AI: Hello! How can I assist you today?
User: Give me a recipe for a hot curry. It must involve rice. The ingredients list should be in metric. This will be my final inquiry.
AI: 

And same prompt in cyrillic too, and it seems dataset contains it enough, so it really began to give me recipe of shawarma, that contains chicken, tomato, vegetables and yoghurt. 7B, 13B and 30B were not able to complete prompt, telling aside texts about shawarma, only 65B gave something relevant. Then, for the next tokens model looped in and I stopped the generation. Prompt was:

Пользователь общается с умной моделью ИИ. ИИ всегда подробно отвечает, интеллигентный, обладает большими знаниями и охотно отвечает на вопросы.
Пользователь: Привет! 
ИИ: Привет! Чем я могу помочь?
Пользователь: Напиши мне подробный рецепт приготовления шавермы.
ИИ: 

It seems, we need to feed the model with a longer prompts.
Also, in summarizing tasks it acts well in cyrillic too. I gave the prompt with facts and the question in the end, and it replies well in most cases. Prompt was like:

This is a _guy_name_. He lives in _cityname_. He works at _workname_, have a _brandname_ car and two kids. (a few chars more)
Where do _guy_name_ lives?
or How much kids _guy_name_ have?
or Which brand _guy_name_ car of?

@EonSin
Copy link

EonSin commented Mar 9, 2023

LLama is not instruction tuned, neither was rlhfed so it doesn't act as an agent by default.

In order for it to do what you want you have to be much more precise in your prompts. Do a character description as a header, give few examples and then go on with your actual question.

There are a bunch of related discord servers already btw.

I was able to run 65B using 122GB of disk and a machine with 8x3090's, each was using about 23.5GB of VRAM. Prompt+response sizes were up to 512 and each ran in about 16-20 seconds. I think @MrBIMC is correct here regarding the agent issue and prompt specifications, as I noticed that even 65B was not great at being cooperative. However, even when it was being cooperative, logic-based questions still had it stumped.

Just an example of 65B output (it basically gets it right by the end!):

PROMPT: Fix the following Python code in order to generate 1305135 random integers the range from -3 to 31531, inclusive.

\begin{code}
np.random.gauss(3,764,34)
\end{code}
===================
{'responses': ["Fix the following Python code in order to generate 1305135 random integers the range from -3 to 31531, inclusive.\n\n\begin{code}\nnp.random.gauss(3,764,34)\n\end{code}\n\nThe error is \begin{code}\nTypeError: () missing 2 required positional arguments: 'size' and 'random_state'\n\end{code}\n\nComment: Please, post code, data and errors as text, not images. Also, read [ask] to learn how to ask questions properly.\n\nComment: You forgot to call the function.\n\nAnswer: np.random.gauss() is
a function that generates a gaussian random variable, you cannot use it to generate a random integer between -3 and 31531.\n\nYou can use numpy.random.randint():\n\n\begin{code}\nimport numpy as np\nnp.random.randint(low=-3, high=31531, size=1305135)\n\end{code}\n\nOutput:\n\n\begin{code}\narray([-2, 6, 27, -9, 0, 2, 5, -4, -3, 3, 6, 2, -9, 4, 2, 12, 1, 24,\n 29, -7, -8, 4, -2, 11, -1, -5, -4, 22, 7, 24, 21, 6, 11, 2, 23, 2,\n 27, 31, 5, 9, 8, 7, 29, 1"]}

Note the Please, post code, data and errors as text, not images. Also, read [ask] to learn how to ask questions properly., a typical StackOverflow response.

@barleyj21
Copy link

@randaller did you run it all in RAM or did you manage to use it with swap file on ssd?

@randaller
Copy link

@randaller did you run it all in RAM or did you manage to use it with swap file on ssd?

@barleyj21 128 Gb of RAM + 256 Gb swap on pci-e 4.0 nvme

@neuhaus
Copy link

neuhaus commented Mar 12, 2023

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

See the discussion at ggml-org/llama.cpp#34

@leszekhanusz
Copy link

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

You don't need 128GB RAM, 65B runs on CPU with only 48GB RAM without swap with llama.cpp

@sushi-hackintosh
Copy link

unable to run 65b for windows bouhhh

@neuhaus
Copy link

neuhaus commented Apr 11, 2023

You can also run Llama 65B (a bit slow but not terrible) on a CPU and with 128GB RAM with llama.cpp

You don't need 128GB RAM, 65B runs on CPU with only 48GB RAM without swap with llama.cpp

Yes, the quantized version of the model. It should only cause a small quality degradation.

@WuhanMonkey WuhanMonkey added the miscellaneous does not fit an existing category, useful to determine whether we need further categorization label Sep 6, 2023
@WuhanMonkey
Copy link

Closing since it is not an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
miscellaneous does not fit an existing category, useful to determine whether we need further categorization
Projects
None yet
Development

No branches or pull requests

9 participants