-
Notifications
You must be signed in to change notification settings - Fork 11.5k
starcoder : add GPU offloading #3827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Nexesenex
pushed a commit
to Nexesenex/croco.cpp
that referenced
this pull request
Oct 28, 2023
* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci
Nexesenex
pushed a commit
to Nexesenex/croco.cpp
that referenced
this pull request
Oct 28, 2023
* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci
wsxiaoys
added a commit
to wsxiaoys/llama.cpp
that referenced
this pull request
Nov 4, 2023
ggerganov
pushed a commit
that referenced
this pull request
Nov 5, 2023
github-actions bot
pushed a commit
to KerfuffleV2/ggml-sys-bleedingedge
that referenced
this pull request
Nov 9, 2023
== Relevant log messages from source repo: commit 875fb42871a0f5a88fbe31a0b5edd697b84038e4 Author: slaren <[email protected]> Date: Wed Nov 8 13:15:14 2023 +0100 ggml-alloc : fix backend assignments of views (#3982) commit e9c1cecb9d7d743d30b4a29ecd56a411437def0a Author: xaedes <[email protected]> Date: Tue Nov 7 09:04:51 2023 +0100 ggml : fix backward rope after YaRN (#3974) * fix backward process of rope rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration commit 46876d2a2c92e60579dc732cdb8cbd243b06f317 Author: Meng Zhang <[email protected]> Date: Mon Nov 6 22:49:08 2023 -0800 cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build * doc: add comments to ggml_cublas_loaded() * fix defined(...) commit 2833a6f63c1b87c7f4ac574bcf7a15a2f3bf3ede Author: slaren <[email protected]> Date: Sun Nov 5 18:45:16 2023 +0100 ggml-cuda : fix f16 mul mat (#3961) * ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus) commit 132d25b8a62ea084447e0014a0112c1b371fb3f8 Author: Jared Van Bortel <[email protected]> Date: Sun Nov 5 10:08:57 2023 -0500 cuda : fix disabling device with --tensor-split 1,0 (#3951) Co-authored-by: slaren <[email protected]> commit 3d48f42efcd05381221654376e9f6f69d76af739 Author: Meng Zhang <[email protected]> Date: Sun Nov 5 04:40:08 2023 -0800 llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) as done in ggml-org/llama.cpp#3827 commit c41ea36eaa3548776de4cb3d5d49b925cd3fc0f2 Author: Eve <[email protected]> Date: Sun Nov 5 08:03:09 2023 +0000 cmake : MSVC instruction detection (fixed up #809) (#3923) * Add detection code for avx * Only check hardware when option is ON * Modify per code review sugguestions * Build locally will detect CPU * Fixes CMake style to use lowercase like everywhere else * cleanup * fix merge * linux/gcc version for testing * msvc combines avx2 and fma into /arch:AVX2 so check for both * cleanup * msvc only version * style * Update FindSIMD.cmake --------- Co-authored-by: Howard Su <[email protected]> Co-authored-by: Jeremy Dunn <[email protected]> commit 48ade94538fa509465d71023e49d07aab0ec8cd5 Author: slaren <[email protected]> Date: Sun Nov 5 08:12:13 2023 +0100 cuda : revert CUDA pool stuff (#3944) * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d. * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" This reverts commit d6069051de7165a4e06662c89257f5d2905bb156. ggml-ci commit d9b33fe95bd257b36c84ee5769cc048230067d6f Author: Peter Sugihara <[email protected]> Date: Fri Nov 3 12:18:18 2023 -0700 metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) commit 5ba37461711095c0284233dbd14f0d9010cdbf56 Author: Xiao-Yong Jin <[email protected]> Date: Fri Nov 3 13:00:31 2023 -0500 ggml-metal: fix yarn rope (#3937) commit abb77e7319aabc0b5cfb7c22da690a692489b6b7 Author: slaren <[email protected]> Date: Fri Nov 3 12:13:09 2023 +0100 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) commit 05816027d649f977468fc804cdb54e99eac246d1 Author: Georgi Gerganov <[email protected]> Date: Fri Nov 3 09:24:00 2023 +0200 common : YAYF (yet another YARN fix) (#3925) ggml-ci commit 3fdbe6b66b7b5c6ad3b2f245cbad1517c27ff776 Author: cebtenzzre <[email protected]> Date: Fri Nov 3 02:31:58 2023 -0400 llama : change yarn_ext_factor placeholder to -1 (#3922) commit 629f917cd6b96ba1274c49a8aab163b1b189229d Author: Kerfuffle <[email protected]> Date: Thu Nov 2 13:58:22 2023 -0600 cuda : add ROCM aliases for CUDA pool stuff (#3918) commit c7743fe1c1cbda5a886362aa371480360580fdf0 Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 20:32:11 2023 +0200 cuda : fix const ptrs warning causing ROCm build issues (#3913) commit d6069051de7165a4e06662c89257f5d2905bb156 Author: Oleksii Maryshchenko <[email protected]> Date: Thu Nov 2 18:10:39 2023 +0100 cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <[email protected]> commit 4ff1046d75e64f0e556d8dcd930ea25c23eb8b18 Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 16:22:30 2023 +0200 gguf : print error for GGUFv1 files (#3908) commit 21958bb393a654591ed26f339791b752d58f5c8b Author: slaren <[email protected]> Date: Thu Nov 2 13:10:33 2023 +0100 cmake : disable LLAMA_NATIVE by default (#3906) commit 2756c4fbffab097736d5116007872d86456a544a Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 11:20:21 2023 +0200 gguf : remove special-case code for GGUFv1 (#3901) ggml-ci commit 1efae9b7dca2a5cc5aa21c1997b538022964ea19 Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 09:54:18 2023 +0200 llm : prevent from 1-D tensors being GPU split (#3697) commit b12fa0d1c13596869c512f49a526b979c94787cc Author: cebtenzzre <[email protected]> Date: Thu Nov 2 02:50:16 2023 -0400 build : link against build info instead of compiling against it (#3879) * cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <[email protected]> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <[email protected]> commit 4d719a6d4e74b9a98e75f826f865f3153717d54b Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 08:35:10 2023 +0200 cuda : check if this fixes Pascal card regression (#3882) commit 183b3fac6c28e65d23ac0230c1dd6fb84bf0154d Author: Georgi Gerganov <[email protected]> Date: Thu Nov 2 08:33:37 2023 +0200 metal : fix build errors and kernel sig after #2268 (#3898)
brittlewis12
added a commit
to brittlewis12/llmfarm_core.swift
that referenced
this pull request
Nov 17, 2023
olexiyb
pushed a commit
to Sanctum-AI/llama.cpp
that referenced
this pull request
Nov 23, 2023
* starcoder : do not GPU split 1D bias tensors * starcoder : offload layers to GPU ggml-ci
olexiyb
pushed a commit
to Sanctum-AI/llama.cpp
that referenced
this pull request
Nov 23, 2023
brittlewis12
added a commit
to brittlewis12/llmfarm_core.swift
that referenced
this pull request
Nov 30, 2023
YuMJie
pushed a commit
to YuMJie/powerinfer
that referenced
this pull request
Oct 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.