You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vulkan k-quant mmq and ggml-backend offload functionality (ggml-org#6155)
* Fix Vulkan no kv offload incoherence
* Add k-quant mul mat mat shaders
* Rework working buffer allocation, reduces vram use noticeably
Clean up cpu assist code, replaced with ggml-backend offload function
* Default to all dedicated GPUs
* Add fallback for integrated GPUs if no dedicated GPUs are found
* Add debug info which device is allocating memory
* Fix Intel dequant issue
Fix validation issue
* Fix Vulkan GGML_OP_GET_ROWS implementation
* Clean up merge artifacts
* Remove Vulkan warning
> Meanwhile, if you want to use the Vulkan backend, you should use the commit right before the breaking change, https://github.com/ggerganov/llama.cpp/commit/55c1b2a3bbd470e9e2a3a0618b92cf64a885f806
647
-
648
639
**With docker**:
649
640
650
641
You don't need to install Vulkan SDK. It will be installed inside the container.
0 commit comments