Skip to content

server : re-enable completion and embedded at the same time, fixes #3815 #3876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 1, 2023
Merged

server : re-enable completion and embedded at the same time, fixes #3815 #3876

merged 1 commit into from
Nov 1, 2023

Conversation

a-h
Copy link
Contributor

@a-h a-h commented Oct 31, 2023

Fixes #3815

@christianwengert
Copy link

I tested this branch with

curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "implement  fizzbuzz in C++","n_predict": 1024}'

which returns

{"content":":\n\nfizzbuzz is a fun programming exercise that combines multiplication, division, and string manipulation. In this challenge, you'll create a program that prints numbers from 1 to 100, replacing certain multiples of three (fizz) or five (buzz) with their corresponding names (\"fizz\" or \"buzz\").\n\nhere's the code for implementing fizzbuzz in C++:\n\n```cpp\n#include <iostream>\n\nusing namespace std;\n\nint main() {\n    // loop through numbers from 1 to 100\n    for (int I = 1; I <= 100; i++) {\n        // if number is divisible by 3 and 5, print fizzbuzz\n        if (i % 3 == 0 && I % 5 == 0) {\n            cout << \"fizzbuzz \";\n        }\n        // if number is divisible by 3 but not 5, print fizz\n        else if (i % 3 == 0) {\n            cout << \"fizz \";\n        }\n        // if number is divisible by 5 but not 3, print buzz\n        else if (i % 5 == 0) {\n            cout << \"buzz \";\n        }\n        // otherwise, print the number\n        else {\n            cout << I << \" \";\n        }\n    }\n    return 0;\n}\n```\n\nhere's a breakdown of how the code works:\n\n- we first include the necessary libraries (iostream in this case) and declare that we will be using the standard namespace.\n\n- then, we define the main function which begins the execution of our program. Inside the main function, we create a `for` loop to iterate through numbers from 1 to 100.\n\n- inside the `for` loop, we use an if statement to check if the current number is divisible by both 3 and 5 using the modulo operator (`%`). If it is, we print \"fizzbuzz\".\n\n- if the number is divisible by 3 but not 5, then we print \"fizz\".\n\n- if the number is divisible by 5 but not 3, then we print \"buzz\".\n\n- if the number is not divisible by either 3 or 5, then we simply print the number.\n\n- finally, we return a value of `0` to indicate that our program has completed without error.\n\nthis code should handle all possible cases for numbers from 1 to 100 in terms of divisibility by both 3 and 5.","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/christianwengert/Downloads/models/zephyr-7b-beta.Q4_K_S.gguf","n_ctx":8192,"n_keep":0,"n_predict":1024,"n_probs":0,"penalize_nl":true,"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temp":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0},"model":"/Users/christianwengert/Downloads/models/zephyr-7b-beta.Q4_K_S.gguf","prompt":"implement  fizzbuzz in C++","slot_id":0,"stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":13608.777,"predicted_n":594,"predicted_per_second":43.648301386671264,"predicted_per_token_ms":22.91039898989899,"prompt_ms":340.967,"prompt_n":10,"prompt_per_second":29.328351424038104,"prompt_per_token_ms":34.0967},"tokens_cached":604,"tokens_evaluated":10,"tokens_predicted":594,"truncated":false}

and

curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"content": "Building a website can be done in 10 simple steps:"}'

which returns an embedding vector

{"embedding":[-2.1186861991882324,3.535383939743042,...}

LGTM

@TruongGiangBT
Copy link

I apologize for the inconvenience. I am deploying a server with the following parameters: -cb -v --embedding -np 3 -c 8192 --host "0.0.0.0" -ngl 64. When I perform multiple embedding requests, a segmentation fault occurs. I noticed that if there are 2 slots performing the embedding task simultaneously, it causes an error. I hope to receive a solution soon. Thank you very much.

@a-h a-h deleted the issue_3815_allow_completion_and_embedding branch February 21, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server allow /completion and /embedding
4 participants