starcoder : add GPU offloading #3827

ggerganov · 2023-10-28T08:43:20Z

No description provided.

ggml-ci

* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci

as done in #3827

== Relevant log messages from source repo: commit 875fb42871a0f5a88fbe31a0b5edd697b84038e4 Author: slaren <[email protected]> Date: Wed Nov 8 13:15:14 2023 +0100 ggml-alloc : fix backend assignments of views (#3982) commit e9c1cecb9d7d743d30b4a29ecd56a411437def0a Author: xaedes <[email protected]> Date: Tue Nov 7 09:04:51 2023 +0100 ggml : fix backward rope after YaRN (#3974) * fix backward process of rope rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration commit 46876d2a2c92e60579dc732cdb8cbd243b06f317 Author: Meng Zhang <[email protected]> Date: Mon Nov 6 22:49:08 2023 -0800 cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build * doc: add comments to ggml_cublas_loaded() * fix defined(...) commit 2833a6f63c1b87c7f4ac574bcf7a15a2f3bf3ede Author: slaren <[email protected]> Date: Sun Nov 5 18:45:16 2023 +0100 ggml-cuda : fix f16 mul mat (#3961) * ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus) commit 132d25b8a62ea084447e0014a0112c1b371fb3f8 Author: Jared Van Bortel <[email protected]> Date: Sun Nov 5 10:08:57 2023 -0500 cuda : fix disabling device with --tensor-split 1,0 (#3951) Co-authored-by: slaren <[email protected]> commit 3d48f42efcd05381221654376e9f6f69d76af739 Author: Meng Zhang <[email protected]> Date: Sun Nov 5 04:40:08 2023 -0800 llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) as done in ggml-org/llama.cpp#3827 commit c41ea36eaa3548776de4cb3d5d49b925cd3fc0f2 Author: Eve <[email protected]> Date: Sun Nov 5 08:03:09 2023 +0000 cmake : MSVC instruction detection (fixed up #809) (#3923) * Add detection code for avx * Only check hardware when option is ON * Modify per code review sugguestions * Build locally will detect CPU * Fixes CMake style to use lowercase like everywhere else * cleanup * fix merge * linux/gcc version for testing * msvc combines avx2 and fma into /arch:AVX2 so check for both * cleanup * msvc only version * style * Update FindSIMD.cmake --------- Co-authored-by: Howard Su <[email protected]> Co-authored-by: Jeremy Dunn <[email protected]> commit 48ade94538fa509465d71023e49d07aab0ec8cd5 Author: slaren <[email protected]> Date: Sun Nov 5 08:12:13 2023 +0100 cuda : revert CUDA pool stuff (#3944) * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d. * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" This reverts commit d6069051de7165a4e06662c89257f5d2905bb156. ggml-ci commit d9b33fe95bd257b36c84ee5769cc048230067d6f Author: Peter Sugihara <[email protected]> Date: Fri Nov 3 12:18:18 2023 -0700 metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) commit 5ba37461711095c0284233dbd14f0d9010cdbf56 Author: Xiao-Yong Jin <[email protected]> Date: Fri Nov 3 13:00:31 2023 -0500 ggml-metal: fix yarn rope (#3937) commit abb77e7319aabc0b5cfb7c22da690a692489b6b7 Author: slaren <[email protected]> Date: Fri Nov 3 12:13:09 2023 +0100 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) commit 05816027d649f977468fc804cdb54e99eac246d1 Author: Georgi Gerganov <[email protected]> Date: Fri Nov 3 09:24:00 2023 +0200 common : YAYF (yet another YARN fix) (#3925) ggml-ci commit 3fdbe6b66b7b5c6ad3b2f245cbad1517c27ff776 Author: cebtenzzre <[email protected]> Date: Fri Nov 3 02:31:58 2023 -0400 llama : change yarn_ext_factor placeholder to -1 (#3922) commit 629f917cd6b96ba1274c49a8aab163b1b189229d Author: Kerfuffle <[email protected]> Date: Thu Nov 2 13:58:22 2023 -0600 cuda : add ROCM aliases for CUDA pool stuff (#3918) commit c7743fe1c1cbda5a886362aa371480360580fdf0 Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 20:32:11 2023 +0200 cuda : fix const ptrs warning causing ROCm build issues (#3913) commit d6069051de7165a4e06662c89257f5d2905bb156 Author: Oleksii Maryshchenko <[email protected]> Date: Thu Nov 2 18:10:39 2023 +0100 cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <[email protected]> commit 4ff1046d75e64f0e556d8dcd930ea25c23eb8b18 Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 16:22:30 2023 +0200 gguf : print error for GGUFv1 files (#3908) commit 21958bb393a654591ed26f339791b752d58f5c8b Author: slaren <[email protected]> Date: Thu Nov 2 13:10:33 2023 +0100 cmake : disable LLAMA_NATIVE by default (#3906) commit 2756c4fbffab097736d5116007872d86456a544a Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 11:20:21 2023 +0200 gguf : remove special-case code for GGUFv1 (#3901) ggml-ci commit 1efae9b7dca2a5cc5aa21c1997b538022964ea19 Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 09:54:18 2023 +0200 llm : prevent from 1-D tensors being GPU split (#3697) commit b12fa0d1c13596869c512f49a526b979c94787cc Author: cebtenzzre <[email protected]> Date: Thu Nov 2 02:50:16 2023 -0400 build : link against build info instead of compiling against it (#3879) * cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <[email protected]> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <[email protected]> commit 4d719a6d4e74b9a98e75f826f865f3153717d54b Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 08:35:10 2023 +0200 cuda : check if this fixes Pascal card regression (#3882) commit 183b3fac6c28e65d23ac0230c1dd6fb84bf0154d Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 08:33:37 2023 +0200 metal : fix build errors and kernel sig after #2268 (#3898)

* ggml-org/llama.cpp#3827

* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci

) as done in ggml-org#3827

* ggml-org/llama.cpp#3827

as done in ggml-org/llama.cpp#3827

ggerganov added 2 commits October 28, 2023 11:04

starcoder : do not GPU split 1D bias tensors

53ab053

starcoder : offload layers to GPU

731dd98

ggml-ci

ggerganov merged commit fdee152 into master Oct 28, 2023

ggerganov deleted the starcoder-cuda branch October 28, 2023 09:06

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 28, 2023

starcoder : add GPU offloading (ggml-org#3827)

312aad6

* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 28, 2023

starcoder : add GPU offloading (ggml-org#3827)

945b4fb

* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci

wsxiaoys added a commit to wsxiaoys/llama.cpp that referenced this pull request Nov 4, 2023

feat: mark LLM_ARCH_STARCODER as full offload supported

f14faac

as done in ggml-org#3827

wsxiaoys mentioned this pull request Nov 4, 2023

feat: mark LLM_ARCH_STARCODER as full offload supported #3945

Merged

ggerganov pushed a commit that referenced this pull request Nov 5, 2023

llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)

3d48f42

as done in #3827

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023

enable starcoder gpu offloading

4cd5f2d

* ggml-org/llama.cpp#3827

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

starcoder : add GPU offloading (ggml-org#3827)

00e3dfc

* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

llama : mark LLM_ARCH_STARCODER as full offload supported (ggml-org#3945

fa974c8

) as done in ggml-org#3827

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 30, 2023

enable starcoder gpu offloading

3323149

* ggml-org/llama.cpp#3827

YuMJie pushed a commit to YuMJie/powerinfer that referenced this pull request Oct 25, 2024

llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)

ac02208

as done in ggml-org/llama.cpp#3827

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

starcoder : add GPU offloading #3827

starcoder : add GPU offloading #3827

ggerganov commented Oct 28, 2023

starcoder : add GPU offloading #3827

starcoder : add GPU offloading #3827

Conversation

ggerganov commented Oct 28, 2023