Skip to content

starcoder : add GPU offloading #3827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 28, 2023
Merged

starcoder : add GPU offloading #3827

merged 2 commits into from
Oct 28, 2023

Conversation

ggerganov
Copy link
Member

No description provided.

@ggerganov ggerganov merged commit fdee152 into master Oct 28, 2023
@ggerganov ggerganov deleted the starcoder-cuda branch October 28, 2023 09:06
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 28, 2023
* starcoder : do not GPU split 1D bias tensors

* starcoder : offload layers to GPU

ggml-ci
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 28, 2023
* starcoder : do not GPU split 1D bias tensors

* starcoder : offload layers to GPU

ggml-ci
wsxiaoys added a commit to wsxiaoys/llama.cpp that referenced this pull request Nov 4, 2023
ggerganov pushed a commit that referenced this pull request Nov 5, 2023
github-actions bot pushed a commit to KerfuffleV2/ggml-sys-bleedingedge that referenced this pull request Nov 9, 2023
== Relevant log messages from source repo:

commit 875fb42871a0f5a88fbe31a0b5edd697b84038e4
Author: slaren <[email protected]>
Date:   Wed Nov 8 13:15:14 2023 +0100

    ggml-alloc : fix backend assignments of views (#3982)

commit e9c1cecb9d7d743d30b4a29ecd56a411437def0a
Author: xaedes <[email protected]>
Date:   Tue Nov 7 09:04:51 2023 +0100

    ggml : fix backward rope after YaRN (#3974)

    * fix backward process of rope

    rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.

    the code for the backward process is nearly identically to the forward process:
    the only difference is the sign of the sin-values.

    to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

    for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
    the sin-values will be negated when forward is false.

    * fix finetune rope call to use correct default attn_factor of 1.0f

    * remove unused `ggml_rope_xpos_back`

    it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

    * fix comments explaining the sinus sign in ggml_forward_rope

    * add missing function arguments in declaration

    * fix function argument type in declaration

commit 46876d2a2c92e60579dc732cdb8cbd243b06f317
Author: Meng Zhang <[email protected]>
Date:   Mon Nov 6 22:49:08 2023 -0800

    cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)

    * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build

    * doc: add comments to ggml_cublas_loaded()

    * fix defined(...)

commit 2833a6f63c1b87c7f4ac574bcf7a15a2f3bf3ede
Author: slaren <[email protected]>
Date:   Sun Nov 5 18:45:16 2023 +0100

    ggml-cuda : fix f16 mul mat (#3961)

    * ggml-cuda : fix f16 mul mat

    ggml-ci

    * silence common.cpp warning (bonus)

commit 132d25b8a62ea084447e0014a0112c1b371fb3f8
Author: Jared Van Bortel <[email protected]>
Date:   Sun Nov 5 10:08:57 2023 -0500

    cuda : fix disabling device with --tensor-split 1,0 (#3951)

    Co-authored-by: slaren <[email protected]>

commit 3d48f42efcd05381221654376e9f6f69d76af739
Author: Meng Zhang <[email protected]>
Date:   Sun Nov 5 04:40:08 2023 -0800

    llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)

    as done in ggml-org/llama.cpp#3827

commit c41ea36eaa3548776de4cb3d5d49b925cd3fc0f2
Author: Eve <[email protected]>
Date:   Sun Nov 5 08:03:09 2023 +0000

    cmake : MSVC instruction detection (fixed up #809) (#3923)

    * Add detection code for avx

    * Only check hardware when option is ON

    * Modify per code review sugguestions

    * Build locally will detect CPU

    * Fixes CMake style to use lowercase like everywhere else

    * cleanup

    * fix merge

    * linux/gcc version for testing

    * msvc combines avx2 and fma into /arch:AVX2 so check for both

    * cleanup

    * msvc only version

    * style

    * Update FindSIMD.cmake

    ---------

    Co-authored-by: Howard Su <[email protected]>
    Co-authored-by: Jeremy Dunn <[email protected]>

commit 48ade94538fa509465d71023e49d07aab0ec8cd5
Author: slaren <[email protected]>
Date:   Sun Nov 5 08:12:13 2023 +0100

    cuda : revert CUDA pool stuff (#3944)

    * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

    This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d.

    * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

    This reverts commit d6069051de7165a4e06662c89257f5d2905bb156.

    ggml-ci

commit d9b33fe95bd257b36c84ee5769cc048230067d6f
Author: Peter Sugihara <[email protected]>
Date:   Fri Nov 3 12:18:18 2023 -0700

    metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938)

commit 5ba37461711095c0284233dbd14f0d9010cdbf56
Author: Xiao-Yong Jin <[email protected]>
Date:   Fri Nov 3 13:00:31 2023 -0500

    ggml-metal: fix yarn rope (#3937)

commit abb77e7319aabc0b5cfb7c22da690a692489b6b7
Author: slaren <[email protected]>
Date:   Fri Nov 3 12:13:09 2023 +0100

    ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)

commit 05816027d649f977468fc804cdb54e99eac246d1
Author: Georgi Gerganov <[email protected]>
Date:   Fri Nov 3 09:24:00 2023 +0200

    common : YAYF (yet another YARN fix) (#3925)

    ggml-ci

commit 3fdbe6b66b7b5c6ad3b2f245cbad1517c27ff776
Author: cebtenzzre <[email protected]>
Date:   Fri Nov 3 02:31:58 2023 -0400

    llama : change yarn_ext_factor placeholder to -1 (#3922)

commit 629f917cd6b96ba1274c49a8aab163b1b189229d
Author: Kerfuffle <[email protected]>
Date:   Thu Nov 2 13:58:22 2023 -0600

    cuda : add ROCM aliases for CUDA pool stuff (#3918)

commit c7743fe1c1cbda5a886362aa371480360580fdf0
Author: Georgi Gerganov <[email protected]>
Date:   Thu Nov 2 20:32:11 2023 +0200

    cuda : fix const ptrs warning causing ROCm build issues (#3913)

commit d6069051de7165a4e06662c89257f5d2905bb156
Author: Oleksii Maryshchenko <[email protected]>
Date:   Thu Nov 2 18:10:39 2023 +0100

    cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)

    * Using cuda memory pools for async alloc/dealloc.

    * If cuda device doesnt support memory pool than use old implementation.

    * Removed redundant cublasSetStream

    ---------

    Co-authored-by: Oleksii Maryshchenko <[email protected]>

commit 4ff1046d75e64f0e556d8dcd930ea25c23eb8b18
Author: Georgi Gerganov <[email protected]>
Date:   Thu Nov 2 16:22:30 2023 +0200

    gguf : print error for GGUFv1 files (#3908)

commit 21958bb393a654591ed26f339791b752d58f5c8b
Author: slaren <[email protected]>
Date:   Thu Nov 2 13:10:33 2023 +0100

    cmake : disable LLAMA_NATIVE by default (#3906)

commit 2756c4fbffab097736d5116007872d86456a544a
Author: Georgi Gerganov <[email protected]>
Date:   Thu Nov 2 11:20:21 2023 +0200

    gguf : remove special-case code for GGUFv1 (#3901)

    ggml-ci

commit 1efae9b7dca2a5cc5aa21c1997b538022964ea19
Author: Georgi Gerganov <[email protected]>
Date:   Thu Nov 2 09:54:18 2023 +0200

    llm : prevent from 1-D tensors being GPU split (#3697)

commit b12fa0d1c13596869c512f49a526b979c94787cc
Author: cebtenzzre <[email protected]>
Date:   Thu Nov 2 02:50:16 2023 -0400

    build : link against build info instead of compiling against it (#3879)

    * cmake : fix build when .git does not exist

    * cmake : simplify BUILD_INFO target

    * cmake : add missing dependencies on BUILD_INFO

    * build : link against build info instead of compiling against it

    * zig : make build info a .cpp source instead of a header

    Co-authored-by: Matheus C. França <[email protected]>

    * cmake : revert change to CMP0115

    ---------

    Co-authored-by: Matheus C. França <[email protected]>

commit 4d719a6d4e74b9a98e75f826f865f3153717d54b
Author: Georgi Gerganov <[email protected]>
Date:   Thu Nov 2 08:35:10 2023 +0200

    cuda : check if this fixes Pascal card regression (#3882)

commit 183b3fac6c28e65d23ac0230c1dd6fb84bf0154d
Author: Georgi Gerganov <[email protected]>
Date:   Thu Nov 2 08:33:37 2023 +0200

    metal : fix build errors and kernel sig after #2268 (#3898)
brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
* starcoder : do not GPU split 1D bias tensors

* starcoder : offload layers to GPU

ggml-ci
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 30, 2023
YuMJie pushed a commit to YuMJie/powerinfer that referenced this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant