Skip to content

v0.2.3

Latest
Compare
Choose a tag to compare
@yzh119 yzh119 released this 11 Mar 02:22
· 12 commits to main since this release
fdedc43

Breaking Changes

We changed the interface for sampling APIs, more specifically (see #912 ):

  • The sampling API removes the success return value of all sampling API, which is not compatible with earlier design.
  • Instead of passing uniform tensor, we changed the sampling interface to accept torch.Generator (optional, https://pytorch.org/docs/stable/generated/torch.Generator.html), to align with the behavior of torch.

What's Changed

  • release: bump version v0.2.2.post1 by @yzh119 in #902
  • Naive Support for Hopper FP8 Prefill Kernel with Per-Head Quantization by @happierpig in #869
  • bugfix: Fix no return type error by @yzh119 in #904
  • ci: add dockerfile for CI by @yzh119 in #909
  • ci: bugfix on release-ci-docker github action by @yzh119 in #910
  • feat: flashinfer intra-kernel profiler by @yzh119 in #913
  • [Package] Add tvm binding to flashinfer.data when packaging by @MasterJH5574 in #917
  • refactor: move triton dependency to flashinfer.triton by @yzh119 in #918
  • sampling: dual pivot rejection sampling algorithm to improve top-p/top-k sampling efficiency by @yzh119 in #912
  • feat: support non-contiguous input/output in normalization functions by @yzh119 in #921
  • feat: improve sampling algorithm robustness by @yzh119 in #923
  • perf: use max probability instead of 1 as upper bound in top-p/k sampling by @yzh119 in #925
  • fix: add install step of profiler's dependency by @zobinHuang in #929
  • fix: undefined symbol cudaGetDriverEntryPointByVersion with CUDA >= 12.5 by @zobinHuang in #928
  • feat: experimenta support of PDL by @yzh119 in #930
  • release: bump version to v0.2.3 by @yzh119 in #932

New Contributors

Full Changelog: v0.2.2.post1...v0.2.3