You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So far 10 different models are supported across 5 different architectures (including OpenAssistant and Open-Chat-Kit models) are supported by nolanoorg/cformers.
You can now interface with the models with just 3 lines of code from python.
from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate=100); print(x['token_str'])
Generation speed is same as this repo (75 ms/token for 12B model on Macbook Pro)
I have no clue about this, but I saw that chatglm-6b was published, which should run on CPU with 16GB ram, albeit very slow.
https://huggingface.co/THUDM/chatglm-6b/tree/main
Would it be possible to substitute the llama model?
The text was updated successfully, but these errors were encountered: