Merge remote-tracking branch 'upstream/concedo'

YellowRoseCx · YellowRoseCx · commit 2a263983ab35 · 2023-07-29T15:16:33.000-05:00
diff --git a/README.md b/README.md
@@ -72,9 +72,13 @@ For more information, be sure to run the program with the `--help` flag.
 ## Android (Termux) Alternative method
 - See https://github.com/ggerganov/llama.cpp/pull/1828/files
 
-## CuBLAS?
+## Using CuBLAS
 - If you're on Windows with an Nvidia GPU you can get CUDA support out of the box using the `--usecublas` flag, make sure you select the correct .exe with CUDA support.
-- You can attempt a CuBLAS build with `LLAMA_CUBLAS=1` or using the provided CMake file (best for visual studio users). If you use the CMake file to build, copy the `koboldcpp_cublas.dll` generated into the same directory as the `koboldcpp.py` file. If you are bundling executables, you may need to include CUDA dynamic libraries (such as `cublasLt64_11.dll` and `cublas64_11.dll`) in order for the executable to work correctly on a different PC. Note that support for CuBLAS is limited.
+- You can attempt a CuBLAS build with `LLAMA_CUBLAS=1` or using the provided CMake file (best for visual studio users). If you use the CMake file to build, copy the `koboldcpp_cublas.dll` generated into the same directory as the `koboldcpp.py` file. If you are bundling executables, you may need to include CUDA dynamic libraries (such as `cublasLt64_11.dll` and `cublas64_11.dll`) in order for the executable to work correctly on a different PC.
+
+## Questions and Help
+- **First, please check out [The KoboldCpp FAQ and Knowledgebase](https://github.com/LostRuins/koboldcpp/wiki/The-KoboldCpp-FAQ-and-Knowledgebase) which may already have answers to your questions! Also please search through past issues and discussions.**
+- If you cannot find an answer, open an issue on this github, or find us on the [KoboldAI Discord](https://koboldai.org/discord).
 
 ## Considerations
 - For Windows: No installation, single file executable, (It Just Works)
@@ -91,11 +95,11 @@ For more information, be sure to run the program with the `--help` flag.
 ## Notes
 - Generation delay scales linearly with original prompt length. If OpenBLAS is enabled then prompt ingestion becomes about 2-3x faster. This is automatic on windows, but will require linking on OSX and Linux. CLBlast speeds this up even further, and `--gpulayers` + `--useclblast` more so.
 - I have heard of someone claiming a false AV positive report. The exe is a simple pyinstaller bundle that includes the necessary python scripts and dlls to run. If this still concerns you, you might wish to rebuild everything from source code using the makefile, and you can rebuild the exe yourself with pyinstaller by using `make_pyinstaller.bat`
-- Supported GGML models:
-  - LLAMA (All versions including ggml, ggmf, ggjt v1,v2,v3, openllama, gpt4all). Supports CLBlast and OpenBLAS acceleration for all versions.
-  - GPT-2 (All versions, including legacy f16, newer format + quanitzed, cerebras, starcoder) Supports CLBlast and OpenBLAS acceleration for newer formats, no GPU layer offload.
-  - GPT-J (All versions including legacy f16, newer format + quantized, pyg.cpp, new pygmalion, janeway etc.) Supports CLBlast and OpenBLAS acceleration for newer formats, no GPU layer offload.
-  - RWKV (all formats except Q4_1_O).
+- Supported GGML models (Includes backward compatibility for older versions/legacy GGML models, though some newer features might be unavailable):
+  - LLAMA and LLAMA2 (LLaMA / Alpaca / GPT4All / Vicuna / Koala / Pygmalion 7B / Metharme 7B / WizardLM and many more)
+  - GPT-2 / Cerebras
+  - GPT-J
+  - RWKV
   - GPT-NeoX / Pythia / StableLM / Dolly / RedPajama
-  - MPT models (ggjt v3)
-  - Basically every single current and historical GGML format that has ever existed should be supported, except for bloomz.cpp due to lack of demand.
+  - MPT models
+