You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(time ./bin/llama-embedding --model ${model_f16} -p "what is panda?</s><s>hi\nwhat is panda?</s><s>it's a bear\nwhat is panda?</s><s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." --pooling rank --embd-normalize -1 --verbose-prompt) 2>&1| tee -a $OUT/${ci}-rk-f16.log
|`-a, --alias STRING`| set alias for model name (to be used by REST API)<br/>(env: LLAMA_ARG_ALIAS) |
137
139
|`--host HOST`| ip address to listen (default: 127.0.0.1)<br/>(env: LLAMA_ARG_HOST) |
138
140
|`--port PORT`| port to listen (default: 8080)<br/>(env: LLAMA_ARG_PORT) |
139
141
|`--path PATH`| path to serve static files from (default: )<br/>(env: LLAMA_ARG_STATIC_PATH) |
140
142
|`--embedding, --embeddings`| restrict to only support embedding use case; use only with dedicated embedding models (default: disabled)<br/>(env: LLAMA_ARG_EMBEDDINGS) |
143
+
|`--reranking, --rerank`| enable reranking endpoint on server (default: disabled)<br/>(env: LLAMA_ARG_RERANKING) |
141
144
|`--api-key KEY`| API key to use for authentication (default: none)<br/>(env: LLAMA_API_KEY) |
142
145
|`--api-key-file FNAME`| path to file containing API keys (default: none) |
143
146
|`--ssl-key-file FNAME`| path to file a PEM-encoded SSL private key<br/>(env: LLAMA_ARG_SSL_KEY_FILE) |
@@ -152,6 +155,7 @@ The project is under active development, and we are [looking for feedback and co
152
155
|`-sps, --slot-prompt-similarity SIMILARITY`| how much the prompt of a request must match the prompt of a slot in order to use that slot (default: 0.50, 0.0 = disabled)<br/> |
153
156
|`--lora-init-without-apply`| load LoRA adapters without applying them (apply later via POST /lora-adapters) (default: disabled) |
154
157
158
+
155
159
Note: If both command line argument and environment variable are both set for the same param, the argument will take precedence over env var.
156
160
157
161
Example usage of docker compose with environment variables:
@@ -478,6 +482,39 @@ The same as [the embedding example](../embedding) does.
478
482
479
483
`image_data`: An array of objects to hold base64-encoded image `data` and its `id`s to be reference in `content`. You can determine the place of the image in the content as in the following: `Image: [img-21].\nCaption: This is a picture of a house`. In this case, `[img-21]` will be replaced by the embeddings of the image with id `21` in the following `image_data` array: `{..., "image_data": [{"data": "<BASE64_STRING>", "id": 21}]}`. Use `image_data` only with multimodal models, e.g., LLaVA.
480
484
485
+
### POST `/reranking`: Rerank documents according to a given query
486
+
487
+
Similar to https://jina.ai/reranker/ but might change in the future.
488
+
Requires a reranker model (such as [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3)) and the `--embedding --pooling rank` options.
489
+
490
+
*Options:*
491
+
492
+
`query`: The query against which the documents will be ranked.
493
+
494
+
`documents`: An array strings representing the documents to be ranked.
495
+
496
+
*Aliases:*
497
+
- `/rerank`
498
+
- `/v1/rerank`
499
+
- `/v1/reranking`
500
+
501
+
*Examples:*
502
+
503
+
```shell
504
+
curl http://127.0.0.1:8012/v1/rerank \
505
+
-H "Content-Type: application/json" \
506
+
-d '{
507
+
"model": "some-model",
508
+
"query": "What is panda?",
509
+
"top_n": 3,
510
+
"documents": [
511
+
"hi",
512
+
"it is a bear",
513
+
"The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China."
514
+
]
515
+
}' | jq
516
+
```
517
+
481
518
### POST `/infill`: For code infilling.
482
519
483
520
Takes a prefix and a suffix and returns the predicted completion as stream.
0 commit comments