flash-attention-2

Here are 13 public repositories matching this topic...

arihanv / Shush

Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app

machine-learning modal transcription whisper huggingface-transformers shadcn-ui flash-attention-2

Updated Jun 7, 2024
TypeScript

alexzhang13 / flashattention2-custom-mask

Star

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

BBC-Esq / WhisperS2T-transcriber

Star

Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files

audio-recorder audio-recording transcription audio-transcribing transcriber audio-transcription transcr ctranslate2 flash-attention-2 whispers2t

Updated Sep 17, 2024
Python

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Feb 27, 2025
C++

erfanzar / jax-flash-attn2

Star

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

Updated Mar 4, 2025
Python

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

Star

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cuda transformer cutlass cute tensorrt feature-matching multihead-attention superpoint lightglue flash-attention flash-attention-2

Updated Mar 3, 2025
Cuda

etasnadi / VulkanCooperativeMatrixAttention

Star

Vulkan & GLSL implementation of FlashAttention-2

vulkan glsl artificial-intelligence gpu-acceleration attention gpu-computing deel-learning tensor-cores large-language-models llm flash-attention flash-attention-2

Updated Jan 19, 2025
C++

graphcore-research / flash-attention-ipu

Star

Poplar implementation of FlashAttention for IPU

deep-learning transformers pytorch ipu graphcore poplar flash-attention flash-attention-2

Updated Mar 12, 2024
C++

gietema / attention

Star

Toy Flash Attention implementation in torch

torch flash-attention flash-attention-2 flash-attention-3

Updated Sep 22, 2024
Python

MaxLSB / flash-attn2

Star

FlashAttention-2 in Triton for sliding window attention (fwd + bwd pass)

python deep-learning pytorch triton sliding-window flash-attention-2 flashattention

Updated Mar 6, 2025
Python

eljandoubi / triton-flash-attn

Star

Coding Flash Attention from scratch using triton

triton flash-attention-2

Updated Mar 8, 2025
Makefile

lalitdotdev / transcribeX

Star

Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.

python modal transformers transcription wavesurfer-js nvidia-cuda bun nvidia-gpu virtual-environment fastapi huggingface-transformers flash-attention-2 next14 whisper- whisperv3

Updated Jun 18, 2024
TypeScript

8e8bdba457c18cf692a95fe2ec67000b / VulkanCooperativeMatrixAttention

Star

Vulkan & GLSL implementation of FlashAttention-2

vulkan glsl artificial-intelligence gpu-acceleration attention gpu-computing deel-learning tensor-cores large-language-models llm flash-attention flash-attention-2

Updated Mar 13, 2025

Improve this page

Add a description, image, and links to the flash-attention-2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention-2 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention-2

Here are 13 public repositories matching this topic...

arihanv / Shush

alexzhang13 / flashattention2-custom-mask

BBC-Esq / WhisperS2T-transcriber

Bruce-Lee-LY / flash_attention_inference

erfanzar / jax-flash-attn2

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

etasnadi / VulkanCooperativeMatrixAttention

graphcore-research / flash-attention-ipu

gietema / attention

MaxLSB / flash-attn2

eljandoubi / triton-flash-attn

lalitdotdev / transcribeX

8e8bdba457c18cf692a95fe2ec67000b / VulkanCooperativeMatrixAttention

Improve this page

Add this topic to your repo