Skip to content

Better Aggregator Methods, Faster CountMinSketch and Batched!

Compare
Choose a tag to compare
@johnynek johnynek released this 26 Jun 00:43
· 553 commits to develop since this release
088bb65

This release adds many convenience methods to Aggregator, adds a new type called Batched[T], and speeds up CMS.

Aggregator now has methods for reservoir sampling, and more top-K (sort*Take) aggregators. Batched allows you to defer doing any work on plus until you have a certain size, then it calls sumOption internally. This is designed for aggregations that are expensive to do iteratively, but sumOption can be made efficient. Lastly, CMS was significantly improved in performance, a sumOption method was added, and a mutable builder (CMSSummation) was added (see #533).

This release should be 100% binary compatible with 0.12.0 (this check is now part of the travis-ci checks we run).

  • Add an Identity Monad #511
  • Improve toRichTraverable to work with Iterator also #518 #535
  • fix several flakey tests #510 #514 #525
  • Improve SpaceSaver design #519
  • Add sortByTake, sortByReverseTake to Aggregator #527
  • Add a randomSample and reservoirSample aggregators #529
  • Add a Batched type for converting plus to sumOption (defer plus until you have a batch): #530
  • Add a default size to appoximatePercentile: #531
  • Add a .group method to MapAlgebra and RichTraversable #532
  • Optimize CountMinSketch, add a mutable Builder for faster construction: #533

Thank you to:
@joshualande @non @dossett @jnievelt @piyushnarang @koertkuipers @Gabriel439 @NathanHowell @johnynek @ianoc