Skip to content

Commit 58b367c

Browse files
authored
cuBLAS: refactor and optimize f16 mat mul performance (LostRuins#1259)
* cuBLAS: refactor, convert fp16 to fp32 on device * cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16 * fix build * cuBLAS: update block_q5_1
1 parent ea3a0ad commit 58b367c

File tree

4 files changed

+479
-258
lines changed

4 files changed

+479
-258
lines changed

0 commit comments

Comments
 (0)