You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current group-gemm configuration raises the following error on
NVIDIA 3090 :
```shell
RuntimeError: cutlass group_gemm.initialize failed: Error Internal
```
Modify the stage of group-gemm to 4, reduce the size of dynamic smem, so
that it can be called on GPUs like the 3090.
Additionally, I also did a simple comparison on the A800. Modifying the
stage to 4 can still slightly improve the performance of group-gemm.
Refer to:
https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemm_grouped_sm80.cu
0 commit comments