You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
commit 53b5ae02cb1b533b78302422951bcfdeca6e2738
Author: YellowRoseCx <[email protected]>
Date: Tue Dec 12 12:08:29 2023 -0600
mixtral fan service
commit 168b1d74e26d0321e2e89358303b6c33e8d7d33e
Merge: f13295b de15d4a6
Author: YellowRoseCx <[email protected]>
Date: Tue Dec 12 12:00:52 2023 -0600
Merge branch 'kcpp-rocm-mixtral2' into main2
commit de15d4a632939a685ec12fa17355298542facf15
Merge: 74acc54ea4402b
Author: YellowRoseCx <[email protected]>
Date: Tue Dec 12 11:45:19 2023 -0600
Merge branch 'mixtral' into kcpp-rocm-mixtral
commit ea4402b
Author: Georgi Gerganov <[email protected]>
Date: Tue Dec 12 17:03:38 2023 +0200
test-backend-ops : add one more sum_rows test
commit a51bc0c
Author: Georgi Gerganov <[email protected]>
Date: Tue Dec 12 15:55:42 2023 +0200
metal : fix binary ops for ne10 % 4 != 0
commit 08eb991
Author: Georgi Gerganov <[email protected]>
Date: Tue Dec 12 14:14:15 2023 +0200
metal : add cpy f16 -> f32 kernel
commit a742d9f
Author: slaren <[email protected]>
Date: Tue Dec 12 12:46:33 2023 +0100
gguf-py : bump version
commit 6a419f4
Author: Georgi Gerganov <[email protected]>
Date: Tue Dec 12 13:04:33 2023 +0200
convert : support safetensors format
commit 74acc54
Author: Concedo <[email protected]>
Date: Tue Dec 12 10:53:34 2023 +0800
Revert "Hide hipBLAS (ROCm) if CuBLAS exists - vice versa"
This reverts commit 4b854d4.
commit f1cbfab
Author: slaren <[email protected]>
Date: Mon Dec 11 20:02:55 2023 +0100
convert : fix style
commit 7dc75e3
Author: slaren <[email protected]>
Date: Mon Dec 11 20:00:28 2023 +0100
convert : use 1e6 rope_freq_base for mixtral
commit 296c945
Author: slaren <[email protected]>
Date: Mon Dec 11 16:53:25 2023 +0100
cuda : fix mul_mat_id with multi gpu
commit 33e50f1
Author: slaren <[email protected]>
Date: Mon Dec 11 12:27:48 2023 +0100
test-backend-ops : disable MOE test with thread sanitizer
commit ffda94c
Author: slaren <[email protected]>
Date: Mon Dec 11 12:15:31 2023 +0100
test-backend-ops : simplify and disable slow tests to avoid CI timeout
commit 06581f2
Author: Concedo <[email protected]>
Date: Mon Dec 11 16:54:42 2023 +0800
perf endpoint lets you monitor if the embedded horde worker has issues
commit fce971d
Author: Concedo <[email protected]>
Date: Mon Dec 11 16:17:10 2023 +0800
do not build the clblast noavx2 binary if not on windows
commit 8cbaed1
Author: Georgi Gerganov <[email protected]>
Date: Mon Dec 11 08:55:16 2023 +0200
llama : fix hard-coded number of experts
commit 4b854d4
Author: YellowRoseCx <[email protected]>
Date: Sun Dec 10 22:49:35 2023 -0600
Hide hipBLAS (ROCm) if CuBLAS exists - vice versa
commit b002981
Author: slaren <[email protected]>
Date: Mon Dec 11 02:43:52 2023 +0100
test-backend-ops : fix dequantize block offset
commit f1380d7
Author: slaren <[email protected]>
Date: Sun Dec 10 22:58:31 2023 +0100
test-backend-ops : add cpy from f32 -> all types test
commit 54d254b
Author: slaren <[email protected]>
Date: Sun Dec 10 21:52:11 2023 +0100
test-backend-ops : cleanup, add moe test for batches
commit e2cf3b7
Author: henk717 <[email protected]>
Date: Sun Dec 10 14:30:17 2023 +0100
koboldcpp.sh - The Mamba Multitool (LostRuins#554)
* .sh script V1
* koboldcpp.sh polish
* koboldcpp.sh dist generator
* Include html's in dist
* RWKV in Linux Dist
* Lower dependency requirements
* Eliminate wget dependency
* More distinct binary name
I know its technically amd64, but I don't want to cause confusion among nvidia users.
* Use System OpenCL
Unsure how this will behave in the pyinstaller build, but pocl ended up CPU only. With a bit of luck the pyinstaller uses the one from the actual system if compiled in a system without opencl, while conda now includes it for that specific system.
* Add cblas dependency
Missing this causes compile failures on some system's
* ICD workaround
Ideally we find a better solution, but conda forces ICD and needs this for the successful compile. However, pyinstaller then embeds the ICD causing it to be limited to the system it was compiled for. By temporarily removing the ICD pyinstaller can't find it and everything remains functional. Ideally we do this on a pyinstaller level, but I could not find any good options to do so yet.
---------
Co-authored-by: root <root@DESKTOP-DQ1QRAG>
commit 54ba263
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 15:27:41 2023 +0200
test-backend-ops : make experts more evenly probable (test_moe)
commit b0b83dd
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 14:30:38 2023 +0200
metal : fix ggml_mul_mat_id for F32
commit 65923a8
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 14:17:46 2023 +0200
convert : determine n_ctx correctly
commit 8614aa7
Author: slaren <[email protected]>
Date: Sun Dec 10 13:12:11 2023 +0100
cuda : fix get_rows when ncols is odd
commit cefebb3
Author: slaren <[email protected]>
Date: Sun Dec 10 13:11:39 2023 +0100
test-backend-ops : add moe test
commit e640cbe
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 13:57:54 2023 +0200
llama : add n_expert and n_expert_used to hparams + change quants
commit d1259b7
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 13:00:13 2023 +0200
llama : do not quantize expert gating tensors
commit 6cfb31f
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 10:59:13 2023 +0200
metal : add indirect mat-vec kernels for all quantization types
commit 016f9bb
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 10 09:38:21 2023 +0200
metal : fix ggml_get_rows to work with non-cont src1
commit 0710b0f
Author: slaren <[email protected]>
Date: Sat Dec 9 23:29:47 2023 +0100
llama : offload missing ffn_moe_silu
commit 62b95f9
Author: slaren <[email protected]>
Date: Sat Dec 9 22:39:34 2023 +0100
cuda : support non-contiguous src1 in get_rows
commit 2e4db48
Author: slaren <[email protected]>
Date: Sat Dec 9 22:38:22 2023 +0100
ggml : update get_rows f16 and q
commit ac3f7d8
Author: slaren <[email protected]>
Date: Sat Dec 9 19:19:03 2023 +0100
ggml : get_rows : support non-contiguos tensors with gaps, generalize up to 3D
commit 8c5b66e
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 15:30:34 2023 +0200
metal : reduce the kernel launches for ggml_mul_mat_id
commit 7e2006b
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 14:24:58 2023 +0200
metal : add/mul/div use general kernel when src1 not cont
commit 06dfde3
Author: slaren <[email protected]>
Date: Sat Dec 9 13:21:09 2023 +0100
llama : add basic support for offloading moe with CUDA
commit 2cbcba8
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 14:18:42 2023 +0200
metal : add more general support for ggml_get_rows + tests
commit 9064b1c
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 14:04:54 2023 +0200
ggml : fix ggml_get_rows to take into account ne02 / ne11
commit ee8fb39
Author: slaren <[email protected]>
Date: Sat Dec 9 12:42:25 2023 +0100
ggml : add n_as argument to ggml_mul_mat_id
commit 7372b62
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 13:18:58 2023 +0200
ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu only)
commit 8b185b7
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 13:01:42 2023 +0200
llama : fix expert weighting in the FFN
commit 7ea3695
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 12:45:15 2023 +0200
llama : first working version
commit af1a096
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 12:07:39 2023 +0200
llama : fix cur -> cur_expert
commit aedfad1
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 11:47:40 2023 +0200
llama : update graph to support MoE
commit 861cd67
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 11:19:46 2023 +0200
ggml : sync latest ggml_mul_mat_id
commit a3eefe9
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 11:14:03 2023 +0200
llama : model loading
commit d38e41e
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 10:59:37 2023 +0200
convert : fix n_ff typo
commit dff8cbe
Author: Georgi Gerganov <[email protected]>
Date: Sat Dec 9 10:51:58 2023 +0200
convert : support Mixtral as LLAMA arch
commit 7a69152
Author: Concedo <[email protected]>
Date: Fri Dec 8 21:06:32 2023 +0800
lowvram var defaults
commit 7418bca
Author: Concedo <[email protected]>
Date: Fri Dec 8 19:20:30 2023 +0800
up ver
commit c47bc28
Author: Concedo <[email protected]>
Date: Fri Dec 8 18:35:45 2023 +0800
slight refactor for noscript ui
commit 7469f20
Author: Concedo <[email protected]>
Date: Fri Dec 8 18:16:14 2023 +0800
use lowvram flag for offload qkv
commit ec21fa7
Merge: 930cdfbfe680e3
Author: Concedo <[email protected]>
Date: Fri Dec 8 17:42:26 2023 +0800
Merge branch 'master' into concedo_experimental
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# ggml-cuda.cu
# llama.cpp
# llama.h
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
commit 930cdfb
Author: Concedo <[email protected]>
Date: Fri Dec 8 16:53:30 2023 +0800
updated lite, added patch that links to noscript mode
commit fe680e3
Author: Georgi Gerganov <[email protected]>
Date: Thu Dec 7 22:26:54 2023 +0200
sync : ggml (new ops, tests, backend, etc.) (ggml-org#4359)
* sync : ggml (part 1)
* sync : ggml (part 2, CUDA)
* sync : ggml (part 3, Metal)
* ggml : build fixes
ggml-ci
* cuda : restore lost changes
* cuda : restore lost changes (StableLM rope)
* cmake : enable separable compilation for CUDA
ggml-ci
* ggml-cuda : remove device side dequantize
* Revert "cmake : enable separable compilation for CUDA"
This reverts commit 09e35d0.
* cuda : remove assert for rope
* tests : add test-backend-ops
* ggml : fix bug in ggml_concat
* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`
* ci : try to fix macOS
* ggml-backend : remove backend self-registration
* ci : disable Metal for macOS cmake build
ggml-ci
* metal : fix "supports family" call
* metal : fix assert
* metal : print resource path
ggml-ci
---------
Co-authored-by: slaren <[email protected]>
commit bcc0eb4
Author: Georgi Gerganov <[email protected]>
Date: Thu Dec 7 13:03:17 2023 +0200
llama : per-layer KV cache + quantum K cache (ggml-org#4309)
* per-layer KV
* remove unnecessary copies
* less code duplication, offload k and v separately
* llama : offload KV cache per-layer
* llama : offload K shift tensors
* llama : offload for rest of the model arches
* llama : enable offload debug temporarily
* llama : keep the KV related layers on the device
* llama : remove mirrors, perform Device -> Host when partial offload
* common : add command-line arg to disable KV cache offloading
* llama : update session save/load
* llama : support quantum K cache (ggml-org#4312)
* llama : support quantum K cache (wip)
* metal : add F32 -> Q8_0 copy kernel
* cuda : add F32 -> Q8_0 copy kernel
ggml-ci
* cuda : use mmv kernel for quantum cache ops
* llama : pass KV cache type through API
* llama : fix build
ggml-ci
* metal : add F32 -> Q4_0 copy kernel
* metal : add F32 -> Q4_1 copy kernel
* cuda : wip
* cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels
* llama-bench : support type_k/type_v
* metal : use mm kernel only for quantum KV cache
* cuda : add comment
* llama : remove memory_f16 and kv_f16 flags
---------
Co-authored-by: slaren <[email protected]>
* readme : add API change notice
---------
Co-authored-by: slaren <[email protected]>
commit 81bc921
Author: Hongyu Ouyang <[email protected]>
Date: Thu Dec 7 02:25:22 2023 -0800
train : fixggml-org#4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (ggml-org#4351)
On commit b1108 (44c117f) xaedes added
ggml_allocr * alloc = NULL;
... (many lines in between)
if (alloc) {
ggml_allocr_free(alloc);
}
Which is correct, but it's easy to lose context after many lines in between.
On commit b1287 (0e76a899) xaedes made a big change. From here on, alloc is freed eagerly.
alloc = ggml_allocr_new(...)
... (short lines of code)
ggml_allocr_free(alloc)
This happens a few times, but alloc is never set to NULL, and many lines below,
we still have
if (alloc) {
ggml_allocr_free(alloc);
}
which causes a double-free.
commit 05cd6e5
Author: Georgi Gerganov <[email protected]>
Date: Wed Dec 6 20:21:59 2023 +0200
server : recognize cache_prompt parameter in OAI API (ggml-org#4347)
commit c751152
Author: Concedo <[email protected]>
Date: Thu Dec 7 00:52:25 2023 +0800
noscript mode is done
commit 12002d8
Author: Concedo <[email protected]>
Date: Wed Dec 6 17:51:08 2023 +0800
very basic noscript mode
commit caa9249
Author: Georgi Gerganov <[email protected]>
Date: Wed Dec 6 10:41:03 2023 +0200
common : fix compile warning
commit da5eaef
Author: stduhpf <[email protected]>
Date: Wed Dec 6 09:08:17 2023 +0100
speculative : support `--color` (ggml-org#4343)
* speculative: add some colors
* minor : add braces
---------
Co-authored-by: Georgi Gerganov <[email protected]>
commit 5f6e0c0
Author: Marcus Dunn <[email protected]>
Date: Tue Dec 5 10:55:12 2023 -1000
grammar : pre-computed pieces + reserve mem + less string copies (ggml-org#4330)
* reserve space for codepoints
* improvement for the appended 0
* used precomputed token text for grammar sample
* reserve canidates_decoded
* reserve canidates_grammar
* remove candidates_decoded
* Revert "remove candidates_decoded"
This reverts commit 3773328.
* changed decode_utf8 to take src by ref
commit 5aa365d
Author: Kerfuffle <[email protected]>
Date: Tue Dec 5 10:19:18 2023 -0700
llama : allow overriding GGUF metadata when loading model (ggml-org#4092)
* feat: Allow overriding GGUF metadata when loading model
* Fix the one time GCC is stricter than clang about something
* Step1
* Refactor... basically everything!
* Nuke obsolete GetArrayLen struct
* simplify std::string specialization
* Various cleanups
Add informational output when overrides are applied
Warn user when an override with the wrong type is specified
* Fix broken logic for parsing bool KV overrides
Fix issue where overrides didn't apply when key missing in GGUF metadata
Resolve merge changes
* llama : rearrange model params
* Update new GET_KEY call
Add note that metadata KV overrides aren't reflected in initial metadata KV info dump
---------
Co-authored-by: cebtenzzre <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
commit b6f952f
Author: Concedo <[email protected]>
Date: Tue Dec 5 21:08:10 2023 +0800
improved exit logic
commit 52c8bc3
Author: MaggotHATE <[email protected]>
Date: Tue Dec 5 15:05:51 2023 +0500
sampling : custom samplers order (ggml-org#4285)
* Samplers sequence order w parameter
* Cleaned commented code
* Fixed formatting
* Rewrote with unordered_map
* Revert and rewrite, too many problems and safeguards would be needed
* Fixed code style
* Code style fixes according to review
* More readable samplers input string, fixed help
* Style fix in sampler_queue
* Formatting fixes
* Fixing whitespaces
commit e4b76bb
Author: kchro3 <[email protected]>
Date: Mon Dec 4 23:29:46 2023 -0800
swift : revert compiler checks for swift package (ggml-org#4332)
commit 23b5e12
Author: Daniel Bevenius <[email protected]>
Date: Mon Dec 4 17:04:21 2023 +0100
simple : update error message for KV cache check (ggml-org#4324)
This commit updates the error message that is printed when the
KV cache is not big enough to hold all the prompt and generated
tokens. Specifically it removes the reference to n_parallel and
replaces it with n_len.
Signed-off-by: Daniel Bevenius <[email protected]>
commit d208995
Author: Miwa / Ensan <[email protected]>
Date: Tue Dec 5 01:03:49 2023 +0900
swift : fix concatenation method to avoid invalid UTF8 stringfication (ggml-org#4325)
commit 5c9f90c
Author: Miwa / Ensan <[email protected]>
Date: Mon Dec 4 22:43:45 2023 +0900
swift : fix prompt tokenization logic (ggml-org#4321)
commit a5a5839
Author: Concedo <[email protected]>
Date: Mon Dec 4 21:10:42 2023 +0800
handle accidentally selecting a kcpps file as model instead
commit 4fa44e8
Author: Ikko Eltociear Ashimine <[email protected]>
Date: Mon Dec 4 16:57:35 2023 +0900
grammar-parser : fix typo (ggml-org#4318)
preceeding -> preceding
commit 8602f5a
Merge: ac36aeefbbc428
Author: Concedo <[email protected]>
Date: Sun Dec 3 22:00:14 2023 +0800
Merge branch 'master' into concedo_experimental
commit fbbc428
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 3 15:56:35 2023 +0200
ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (ggml-org#4308)
* ggml : fix soft max out-of-bounds access
ggml-ci
* ggml : reuse ggml_get_n_tasks() in ggml_graph_plan()
ggml-ci
commit ac36aee
Merge: 48544cd33e171d
Author: Concedo <[email protected]>
Date: Sun Dec 3 21:56:29 2023 +0800
Merge branch 'master' into concedo_experimental
# Conflicts:
# CMakeLists.txt
# Makefile
commit adf3de4
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 3 15:56:22 2023 +0200
ggml : fix soft max out-of-bounds access (ggml-org#4307)
ggml-ci
commit 48544cd
Author: Concedo <[email protected]>
Date: Sun Dec 3 21:46:50 2023 +0800
Revert "Revert "ggml : add ggml_soft_max_ext (ggml-org#4256)""
This reverts commit a8e66ef.
commit 33e171d
Author: Ed Lee <[email protected]>
Date: Sun Dec 3 01:10:43 2023 -0800
server : fix OpenAI API `stop` field to be optional (ggml-org#4299)
(cherry picked from commit Mozilla-Ocho/llamafile@e8c92bc)
commit 6949b50
Author: Rickard Edén <[email protected]>
Date: Sun Dec 3 10:03:25 2023 +0100
py : add grammar to oai like api (ggml-org#4294)
commit d7b800b
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 3 10:58:16 2023 +0200
llama : pad KV cache size (ggml-org#4280)
* llama : pad KV cache size to 32
* metal : try to improve batched decoding
commit 6570a20
Author: Concedo <[email protected]>
Date: Sun Dec 3 15:44:53 2023 +0800
token count includes ids
commit 5a7d312
Author: Georgi Gerganov <[email protected]>
Date: Fri Dec 1 20:39:12 2023 +0200
llama : avoid using "optional" keyword (ggml-org#4283)
commit d5a1cbd
Author: Georgi Gerganov <[email protected]>
Date: Fri Dec 1 20:35:03 2023 +0200
llama : support optional tensors (ggml-org#4283)
commit b220222
Author: Miwa / Ensan <[email protected]>
Date: Sat Dec 2 03:19:45 2023 +0900
swift : fix token_to_piece implementation (ggml-org#4278)
* Fix token_to_piece implementation in Swift
* Fix errors
commit 511f52c
Author: Jared Van Bortel <[email protected]>
Date: Fri Dec 1 13:18:35 2023 -0500
build : enable libstdc++ assertions for debug builds (ggml-org#4275)
commit 03562f3
Author: CausalLM <[email protected]>
Date: Sat Dec 2 02:17:06 2023 +0800
llama : support attention bias on LLaMA architecture (ggml-org#4283)
* Support attention_bias on LLaMA architecture
QKVO bias, should fix InternLM (ggml-org#3133) and works for LLaMAfied Qwen models (ggml-org#3743 (comment)).
* check existence of qkvo bias while loading llama models
Tested on LLaMA2, CUDA and CPU.
* Update llama.cpp
commit 37c746d
Author: Shijie <[email protected]>
Date: Sat Dec 2 02:16:31 2023 +0800
llama : add Qwen support (ggml-org#4281)
* enable qwen to llama.cpp
* llama : do not GPU split bias tensors
---------
Co-authored-by: Georgi Gerganov <[email protected]>
commit 880f579
Author: Georgi Gerganov <[email protected]>
Date: Fri Dec 1 18:42:11 2023 +0200
llama : fix integer overflow during quantization (ggml-org#4284)
happens with multi-threaded quantization of Qwen-72B
ggml-ci
Copy file name to clipboardexpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# koboldcpp-ROCM for AMD
1
+
# <center>koboldcpp-ROCM MIXTRAL FanService Edition for AMD</center>
2
2
Quick Linux install:
3
3
To install, either use the file "[easy_KCPP-ROCm_install.sh](https://github.com/YellowRoseCx/koboldcpp-rocm/blob/main/easy_KCPP-ROCm_install.sh)" or navigate to the folder you want to download to in Terminal then run
printf(" -n N, --n-predict N number of tokens to predict (default: %d, -1 = infinity, -2 = until context filled)\n", params.n_predict);
763
825
printf(" -c N, --ctx-size N size of the prompt context (default: %d, 0 = loaded from model)\n", params.n_ctx);
764
826
printf(" -b N, --batch-size N batch size for prompt processing (default: %d)\n", params.n_batch);
827
+
printf(" --samplers samplers that will be used for generation in the order, separated by \';\', for example: \"top_k;tfs;typical;top_p;min_p;temp\"\n");
828
+
printf(" --sampling-seq simplified sequence for samplers that will be used (default: %s)\n", sparams.samplers_sequence.c_str());
0 commit comments