Release v0.2.3 · flashinfer-ai/flashinfer

Breaking Changes

We changed the interface for sampling APIs, more specifically (see #912 ):

The sampling API removes the success return value of all sampling API, which is not compatible with earlier design.
Instead of passing uniform tensor, we changed the sampling interface to accept torch.Generator (optional, https://pytorch.org/docs/stable/generated/torch.Generator.html), to align with the behavior of torch.

release: bump version v0.2.2.post1 by @yzh119 in #902
Naive Support for Hopper FP8 Prefill Kernel with Per-Head Quantization by @happierpig in #869
bugfix: Fix no return type error by @yzh119 in #904
ci: add dockerfile for CI by @yzh119 in #909
ci: bugfix on release-ci-docker github action by @yzh119 in #910
feat: flashinfer intra-kernel profiler by @yzh119 in #913
[Package] Add tvm binding to flashinfer.data when packaging by @MasterJH5574 in #917
refactor: move triton dependency to flashinfer.triton by @yzh119 in #918
sampling: dual pivot rejection sampling algorithm to improve top-p/top-k sampling efficiency by @yzh119 in #912
feat: support non-contiguous input/output in normalization functions by @yzh119 in #921
feat: improve sampling algorithm robustness by @yzh119 in #923
perf: use max probability instead of 1 as upper bound in top-p/k sampling by @yzh119 in #925
fix: add install step of profiler's dependency by @zobinHuang in #929
fix: undefined symbol cudaGetDriverEntryPointByVersion with CUDA >= 12.5 by @zobinHuang in #928
feat: experimenta support of PDL by @yzh119 in #930
release: bump version to v0.2.3 by @yzh119 in #932

Full Changelog: v0.2.2.post1...v0.2.3