cutlass

Star

Here are 13 public repositories matching this topic...

DefTruth / CUDA-Learn-Notes

Star

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cutlass cudnn cuda-toolkit gemm cuda-programming gemv hgemm flash-attention flash-mla

Updated Mar 4, 2025
Cuda

bytedance / flux

Star

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

gpu cuda pytorch cutlass

Updated Mar 12, 2025
C++

coderonion / awesome-cuda-triton-hpc

Star

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.

Updated Feb 27, 2025

DD-DuDa / Cute-Learning

Star

Examples of CUDA implementations by Cutlass CuTe

gpu cuda cutlass

Updated Feb 2, 2025
Makefile

leimao / CUTLASS-Examples

Sponsor

Star

CUTLASS and CuTe Examples

docker cuda cutlass

Updated Jan 4, 2025
Cuda

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Feb 27, 2025
C++

YashasSamaga / ConvolutionBuildingBlocks

Star

GEMM and Winograd based convolutions using CUTLASS

deep-learning cuda convolution cutlass

Updated Jul 15, 2020
Cuda

yester31 / Cutlass_EX

Star

study of cutlass

cmake cuda cpp17 cutlass linux-programming parallel-programming

Updated Nov 10, 2024
Cuda

Bruce-Lee-LY / cutlass_gemm

Star

Multiple GEMM operators are constructed with cutlass to support LLM inference.

gpu cublas nvidia cutlass gemm cublaslt llm matrix-multiply tensor-core

Updated Sep 27, 2024
C++

sgl-project / whl

Star

Kernel Library Wheel for SGLang

cuda cutlass sglang flashinfer cu118

Updated Mar 12, 2025
HTML

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

Star

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cuda transformer cutlass cute tensorrt feature-matching multihead-attention superpoint lightglue flash-attention flash-attention-2

Updated Mar 3, 2025
Cuda

digital-nomad-cheng / tvm_project_course

Star

neural-network compiler cuda cutlass tensorrt tvm dl-compiler

Updated Nov 2, 2023
Python

Routhleck / blocksparse-pytorch-implement

Star

pytorch implements block sparse

python cuda pytorch matrix-multiplication cutlass blocksparse tilesparse

Updated May 13, 2023
C++

Improve this page

Add a description, image, and links to the cutlass topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cutlass topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cutlass

Here are 13 public repositories matching this topic...

DefTruth / CUDA-Learn-Notes

bytedance / flux

coderonion / awesome-cuda-triton-hpc

DD-DuDa / Cute-Learning

leimao / CUTLASS-Examples

Bruce-Lee-LY / flash_attention_inference

YashasSamaga / ConvolutionBuildingBlocks

yester31 / Cutlass_EX

Bruce-Lee-LY / cutlass_gemm

sgl-project / whl

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

digital-nomad-cheng / tvm_project_course

Routhleck / blocksparse-pytorch-implement

Improve this page

Add this topic to your repo