The lookahead speculative decoding failed when applied to gpt2 0.1b. #2879

JeRainXiong · 2025-03-12T10:09:47Z

Hi,
My environment is A10 22G, trtllm v0.16.0, and CUDA 12.6. When I use lookahead speculative decoding on gpt2 0.1B, I encounter this error. How can I fix it？

[TensorRT-LLM][ERROR] Assertion failed: No available XQA kernels are found for speculative decoding mode.
（/target/trtllm_1118/TensorRT-LLM/cpp/tensorrt 1lm/plugins/gptAttentionCommon/gptAttentionCommon.cpp:1745）

The text was updated successfully, but these errors were encountered:

JeRainXiong · 2025-03-13T09:19:22Z

I solve this problem by using TensorRT-LLM version: 0.18.0.dev2025031100

JeRainXiong closed this as completed Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The lookahead speculative decoding failed when applied to gpt2 0.1b. #2879

The lookahead speculative decoding failed when applied to gpt2 0.1b. #2879

JeRainXiong commented Mar 12, 2025

JeRainXiong commented Mar 13, 2025

The lookahead speculative decoding failed when applied to gpt2 0.1b. #2879

The lookahead speculative decoding failed when applied to gpt2 0.1b. #2879

Comments

JeRainXiong commented Mar 12, 2025

JeRainXiong commented Mar 13, 2025