You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -640,7 +640,7 @@ Please verify the [sha256 checksums](SHA256SUMS) of all downloaded model files t
640
640
641
641
```bash
642
642
# run the verification script
643
-
python3 .\scripts\verify-checksum-models.py
643
+
./scripts/verify-checksum-models.py
644
644
```
645
645
646
646
- On linux or macOS it is also possible to run the following commands to verify if you have all possible latest files in your self-installed `./models` subdirectory:
Copy file name to clipboardexpand all lines: examples/main/README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -293,5 +293,5 @@ These options provide extra functionality and customization when running the LLa
293
293
-`-mg i, --main-gpu i`: When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used. Requires cuBLAS.
294
294
-`-ts SPLIT, --tensor-split SPLIT`: When using multiple GPUs this option controls how large tensors should be split across all GPUs. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each GPU should get in order. For example, "3,2" will assign 60% of the data to GPU 0 and 40% to GPU 1. By default the data is split in proportion to VRAM but this may not be optimal for performance. Requires cuBLAS.
295
295
-`-lv, --low-vram`: Do not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires cuBLAS.
296
-
-`--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model. This allows you to adapt the pretrained model to specific tasks or domains.
296
+
-`--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains.
297
297
-`--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.
Copy file name to clipboardexpand all lines: examples/server/README.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ Command line options:
16
16
-`--memory-f32`: Use 32-bit floats instead of 16-bit floats for memory key+value. Not recommended.
17
17
-`--mlock`: Lock the model in memory, preventing it from being swapped out when memory-mapped.
18
18
-`--no-mmap`: Do not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed.
19
-
-`--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model. This allows you to adapt the pretrained model to specific tasks or domains.
19
+
-`--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains.
20
20
-`--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.
21
21
-`-to N`, `--timeout N`: Server read/write timeout in seconds. Default `600`.
22
22
-`--host`: Set the hostname or ip address to listen. Default `127.0.0.1`.
@@ -66,6 +66,7 @@ Using [curl](https://curl.se/). On Windows `curl.exe` should be available in the
66
66
```sh
67
67
curl --request POST \
68
68
--url http://localhost:8080/completion \
69
+
--header "Content-Type: application/json" \
69
70
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
0 commit comments