You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main goal of llama.cpp is to run the llama model using 4-bit quantization on a MacBook.
19
+
The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook
23
20
24
21
- Plain C/C++ implementation without dependencies
25
22
- Apple silicon first-class citizen - optimized via ARM NEON and Accelerate framework
26
23
- AVX2 support for x86 architectures
27
24
- Mixed F16 / F32 precision
28
-
- 4-bit quantization support
25
+
- 4-bit integer quantization support
29
26
- Runs on the CPU
30
27
31
-
This was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022) - I have no idea if it works correctly.
32
-
Please do not make conclusions about the models based on the results from this implementation.
33
-
For all I know, it can be completely wrong. This project is for educational purposes.
34
-
New features will probably be added mostly through community contributions.
28
+
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
29
+
Since then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves
30
+
as the main playground for developing new features for the [ggml](https://github.com/ggerganov/ggml) library.
35
31
36
32
**Supported platforms:**
37
33
@@ -167,15 +163,27 @@ cd llama.cpp
167
163
168
164
### Build
169
165
170
-
Note: For Windows, CMake or Zig can be used.
166
+
In order to build llama.cpp you have three different options.
171
167
172
-
1. Use `make`
168
+
- Using `make`:
169
+
- On Linux or MacOS:
173
170
174
-
```bash
175
-
make
176
-
```
171
+
```bash
172
+
make
173
+
```
177
174
178
-
1. Use CMake
175
+
- On Windows:
176
+
177
+
1. Download the latest fortran version of [w64devkit](https://github.com/seeto/w64devkit/releases).
178
+
2. Extract `w64devkit` on your pc.
179
+
3. Run `w64devkit.exe`.
180
+
4. Use the `cd`command to reach the `llama.cpp` folder.
181
+
5. From here you can run:
182
+
```bash
183
+
make
184
+
```
185
+
186
+
- Using `CMake`:
179
187
180
188
```bash
181
189
mkdir build
@@ -184,12 +192,71 @@ Note: For Windows, CMake or Zig can be used.
184
192
cmake --build . --config Release
185
193
```
186
194
187
-
1. Use Zig
195
+
- Using `Zig`:
188
196
189
197
```bash
190
198
zig build -Drelease-fast
191
199
```
192
200
201
+
### BLAS Build
202
+
203
+
Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). BLAS doesn't affect the normal generation performance. There are currently three different implementations of it:
204
+
205
+
- Accelerate Framework:
206
+
207
+
This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
208
+
209
+
- OpenBLAS:
210
+
211
+
This provides BLAS acceleration using only the CPU. Make sure to have OpenBLAS installed on your machine.
212
+
213
+
- Using `make`:
214
+
- On Linux:
215
+
```bash
216
+
make LLAMA_OPENBLAS=1
217
+
```
218
+
Note: In order to build on Arch Linux with OpenBLAS support enabled you must edit the Makefile adding at the end of the line 105: `-lcblas`
219
+
220
+
- On Windows:
221
+
222
+
1. Download the latest fortran version of [w64devkit](https://github.com/skeeto/w64devkit/releases).
223
+
2. Download the latest version of [OpenBLAS for Windows](https://github.com/xianyi/OpenBLAS/releases).
224
+
3. Extract `w64devkit` on your pc.
225
+
4. From the OpenBLAS zip that you just downloaded copy `libopenblas.a`, located inside the `lib` folder, inside `w64devkit\x86_64-w64-mingw32\lib`.
226
+
5. From the same OpenBLAS zip copy the content of the `include` folder inside `w64devkit\x86_64-w64-mingw32\include`.
227
+
6. Run `w64devkit.exe`.
228
+
7. Use the `cd`command to reach the `llama.cpp` folder.
229
+
8. From here you can run:
230
+
231
+
```bash
232
+
make LLAMA_OPENBLAS=1
233
+
```
234
+
235
+
- Using `CMake` on Linux:
236
+
237
+
```bash
238
+
mkdir build
239
+
cd build
240
+
cmake .. -DLLAMA_OPENBLAS=ON
241
+
cmake --build . --config Release
242
+
```
243
+
244
+
- cuBLAS
245
+
246
+
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
0 commit comments