response too little tokens? #529
-
Hi, I deploy llama-2-7b-chat.ggmlv3.q6_K.bin with llama-cpp-python[server]. Try to access it with OpenAI API,
its response body,
llama-cpp-python[server] is running in Docker container with following params:
It always return little tokens, how can I get the full poem in this case ? Need I set Max_Tokens? And how? Thanks a lot! |
Beta Was this translation helpful? Give feedback.
Answered by
jeffreydevreede
Jul 28, 2023
Replies: 1 comment
-
In your curl request you need to set the max_tokens attribute: curl -X 'POST'
'http://llama07.server.com/v1/chat/completions'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"max_tokens": 200,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Write a poem for France?",
"role": "user"
}
]
}' |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
st01cs
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In your curl request you need to set the max_tokens attribute: