Breaking Changes
We changed the interface for sampling APIs, more specifically (see #912 ):
- The sampling API removes the
success
return value of all sampling API, which is not compatible with earlier design. - Instead of passing
uniform
tensor, we changed the sampling interface to accepttorch.Generator
(optional, https://pytorch.org/docs/stable/generated/torch.Generator.html), to align with the behavior of torch.
What's Changed
- release: bump version v0.2.2.post1 by @yzh119 in #902
- Naive Support for Hopper FP8 Prefill Kernel with Per-Head Quantization by @happierpig in #869
- bugfix: Fix no return type error by @yzh119 in #904
- ci: add dockerfile for CI by @yzh119 in #909
- ci: bugfix on release-ci-docker github action by @yzh119 in #910
- feat: flashinfer intra-kernel profiler by @yzh119 in #913
- [Package] Add tvm binding to
flashinfer.data
when packaging by @MasterJH5574 in #917 - refactor: move triton dependency to flashinfer.triton by @yzh119 in #918
- sampling: dual pivot rejection sampling algorithm to improve top-p/top-k sampling efficiency by @yzh119 in #912
- feat: support non-contiguous input/output in normalization functions by @yzh119 in #921
- feat: improve sampling algorithm robustness by @yzh119 in #923
- perf: use max probability instead of 1 as upper bound in top-p/k sampling by @yzh119 in #925
- fix: add install step of profiler's dependency by @zobinHuang in #929
- fix: undefined symbol cudaGetDriverEntryPointByVersion with CUDA >= 12.5 by @zobinHuang in #928
- feat: experimenta support of PDL by @yzh119 in #930
- release: bump version to v0.2.3 by @yzh119 in #932
New Contributors
- @happierpig made their first contribution in #869
- @zobinHuang made their first contribution in #929
Full Changelog: v0.2.2.post1...v0.2.3