llama : add Command R Plus support (#6491)
* Add Command R Plus GGUF
* Add Command R Plus GGUF
* Loading works up to LayerNorm2D
* Export new tensors in 1D so they are not quantized.
* Fix embedding layer based on Noeda's example
* Whitespace
* Add line
* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)
* dranger003: Fix block index overflow in CUDA dequantizing.
* Reverted blocked multiplication code as it still has issues and could affect other Llama arches
* export norms as f32
* fix overflow issues during quant and other cleanup
* Type convention
Co-authored-by: Georgi Gerganov <[email protected]>
* dranger003: Fix more int overflow during quant.
---------
Co-authored-by: S <[email protected]>
Co-authored-by: S <[email protected]>
Co-authored-by: slaren <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>