response too little tokens? #529

st01cs · 2023-07-26T02:59:39Z

st01cs
Jul 26, 2023

Hi,

I deploy llama-2-7b-chat.ggmlv3.q6_K.bin with llama-cpp-python[server].

Try to access it with OpenAI API,

curl -X 'POST' 
  'http://llama07.server.com/v1/chat/completions' 
  -H 'accept: application/json' 
  -H 'Content-Type: application/json' 
  -d '{
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "Write a poem for France?",
      "role": "user"
    }
  ]
}'

its response body,

{
  "id": "chatcmpl-93a635e0-af7a-4b78-8e96-f93c84b59c69",
  "object": "chat.completion",
  "created": 1690286307,
  "model": "/models/llama-2-7b-chat.ggmlv3.q6_K.bin",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Of course! Here is a poem for France:\n\nFrance, the land"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "completion_tokens": 16,
    "total_tokens": 42
  }
}

llama-cpp-python[server] is running in Docker container with following params:

        -e USE_MLOCK=0 \
	-e N_THREADS=64 \
	-e N_BATCH=2048 \
	-e N_CTX=8192 \
	-e LAST_N_TOKENS_SIZE=256 \

It always return little tokens, how can I get the full poem in this case ? Need I set Max_Tokens? And how?

Thanks a lot!

Answered by jeffreydevreede

Jul 28, 2023

In your curl request you need to set the max_tokens attribute:

curl -X 'POST' 
  'http://llama07.server.com/v1/chat/completions' 
  -H 'accept: application/json' 
  -H 'Content-Type: application/json' 
  -d '{
  "max_tokens": 200,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "Write a poem for France?",
      "role": "user"
    }
  ]
}'

View full answer

jeffreydevreede · 2023-07-28T20:16:47Z

jeffreydevreede
Jul 28, 2023

In your curl request you need to set the max_tokens attribute:

curl -X 'POST' 
  'http://llama07.server.com/v1/chat/completions' 
  -H 'accept: application/json' 
  -H 'Content-Type: application/json' 
  -d '{
  "max_tokens": 200,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "Write a poem for France?",
      "role": "user"
    }
  ]
}'

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

response too little tokens? #529

{{title}}

Replies: 1 comment

{{title}}

Select a reply

response too little tokens? #529

st01cs Jul 26, 2023

Replies: 1 comment

jeffreydevreede Jul 28, 2023

st01cs
Jul 26, 2023

jeffreydevreede
Jul 28, 2023