Skip to content

Commit 752e13c

Browse files
committed
Merge branch 'release/0.11.0'
2 parents 0a3c320 + 691c877 commit 752e13c

37 files changed

+1487
-523
lines changed

.travis.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ language: scala
22
sudo: false
33
matrix:
44
include:
5-
- scala: 2.10.4
5+
- scala: 2.10.5
66
script: ./sbt ++$TRAVIS_SCALA_VERSION clean test
77

8-
- scala: 2.11.5
8+
- scala: 2.11.7
99
script: ./sbt ++$TRAVIS_SCALA_VERSION clean test
1010
after_success: "./sbt coveralls"

CHANGES.md

+23
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,29 @@
11
# Algebird #
22

3+
### Version 0.11.0 ###
4+
* Move CMSHasherByteArray from scalding: https://github.com/twitter/algebird/pull/467
5+
* Upgrade sbt launcher script (sbt-extras): https://github.com/twitter/algebird/pull/469
6+
* Create case class macros for algebraic structures: https://github.com/twitter/algebird/pull/466
7+
* Refactor MapAggregator: https://github.com/twitter/algebird/pull/462
8+
* Algebird support for spark: https://github.com/twitter/algebird/pull/397
9+
* Add MapAggregator from 1 (key, aggregator) pair: https://github.com/twitter/algebird/pull/452
10+
* Remove unnecessary use of scala.math: https://github.com/twitter/algebird/pull/455
11+
* Don't call deprecated HyperLogLog methods in tests: https://github.com/twitter/algebird/pull/456
12+
* Update product_generators.rb: https://github.com/twitter/algebird/pull/457
13+
* Pzheng/gaussian euclidean: https://github.com/twitter/algebird/pull/448
14+
15+
### Version 0.10.2 ###
16+
* QTree quantileBounds assert percentile <= 1.0 #447
17+
18+
### Version 0.10.1 ###
19+
* Make HLL easier to use, add Hash128 typeclass #440
20+
* add ! to ApproximateBoolean #442
21+
* add QTreeAggregator and add approximatePercentileBounds to Aggregator #443
22+
* Make level configurable in QTreeAggregators #444
23+
324
### Version 0.10.0 ###
25+
* HyperLogLogSeries #295
26+
* CMS: add contramap to convert CMS[K] to CMS[L], add support for String and Bytes, remove Ordering context bound for K #399
427
* EventuallyAggregator and variants #407
528
* Add MultiAggregator.apply #408
629
* Return a MonoidAggregator from MultiAggregator when possible #409

README.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
## Algebird [![Build status](https://img.shields.io/travis/twitter/algebird/develop.svg)](http://travis-ci.org/twitter/algebird) [![Coverage status](https://img.shields.io/coveralls/twitter/algebird/develop.svg)](https://coveralls.io/r/twitter/algebird?branch=develop)
2-
2+
33

44
Abstract algebra for Scala. This code is targeted at building aggregation systems (via [Scalding](https://github.com/twitter/scalding) or [Storm](https://github.com/nathanmarz/storm)). It was originally developed as part of Scalding's Matrix API, where Matrices had values which are elements of Monoids, Groups, or Rings. Subsequently, it was clear that the code had broader application within Scalding and on other projects within Twitter.
55

@@ -10,7 +10,7 @@ See the [current API documentation](http://twitter.github.com/algebird) for more
1010
```scala
1111
> ./sbt algebird-core/console
1212

13-
Welcome to Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_07).
13+
Welcome to Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_40).
1414
Type in expressions to have them evaluated.
1515
Type :help for more information.
1616

@@ -21,7 +21,7 @@ scala> import com.twitter.algebird.Operators._
2121
import com.twitter.algebird.Operators._
2222

2323
scala> Map(1 -> Max(2)) + Map(1 -> Max(3)) + Map(2 -> Max(4))
24-
res1: scala.collection.immutable.Map[Int,com.twitter.algebird.Max[Int]] = Map(2 -> Max(4), 1 -> Max(3))
24+
res0: scala.collection.immutable.Map[Int,com.twitter.algebird.Max[Int]] = Map(2 -> Max(4), 1 -> Max(3))
2525
```
2626
In the above, the class Max[T] signifies that the + operator should actually be max (this is
2727
accomplished by providing an implicit instance of a typeclass for Max that handles +).
@@ -48,7 +48,7 @@ Discussion occurs primarily on the [Algebird mailing list](https://groups.google
4848

4949
## Maven
5050

51-
Algebird modules are available on maven central. The current groupid and version for all modules is, respectively, `"com.twitter"` and `0.10.0`.
51+
Algebird modules are available on maven central. The current groupid and version for all modules is, respectively, `"com.twitter"` and `0.11.0`.
5252

5353
Current published artifacts are
5454

@@ -91,6 +91,6 @@ The answer is a mix of the following:
9191
* Argyris Zymnis <http://twitter.com/argyris>
9292

9393
## License
94-
Copyright 2012 Twitter, Inc.
94+
Copyright 2015 Twitter, Inc.
9595

9696
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

algebird-core/src/main/scala/com/twitter/algebird/Aggregator.scala

+25-1
Original file line numberDiff line numberDiff line change
@@ -116,10 +116,34 @@ object Aggregator extends java.io.Serializable {
116116
/**
117117
* This builds an in-memory Set, and then finally gets the size of that set.
118118
* This may not be scalable if the Uniques are very large. You might check the
119-
* HyperLogLog Aggregator to get an approximate version of this that is scalable.
119+
* approximateUniqueCount or HyperLogLog Aggregator to get an approximate version
120+
* of this that is scalable.
120121
*/
121122
def uniqueCount[T]: MonoidAggregator[T, Set[T], Int] =
122123
toSet[T].andThenPresent(_.size)
124+
125+
/**
126+
* Using a constant amount of memory, give an approximate unique count (~ 1% error).
127+
* This uses an exact set for up to 100 items,
128+
* then HyperLogLog (HLL) with an 1.2% standard error which uses at most 8192 bytes
129+
* for each HLL. For more control, see HyperLogLogAggregator.
130+
*/
131+
def approximateUniqueCount[T: Hash128]: MonoidAggregator[T, Either[HLL, Set[T]], Long] =
132+
SetSizeHashAggregator[T](hllBits = 13, maxSetSize = 100)
133+
134+
/**
135+
* Returns the lower bound of a given percentile where the percentile is between (0,1]
136+
* The items that are iterated over cannot be negative.
137+
*/
138+
def approximatePercentile[T](percentile: Double, k: Int)(implicit num: Numeric[T]): QTreeAggregatorLowerBound[T] =
139+
QTreeAggregatorLowerBound[T](percentile, k)
140+
141+
/**
142+
* Returns the intersection of a bounded percentile where the percentile is between (0,1]
143+
* The items that are iterated over cannot be negative.
144+
*/
145+
def approximatePercentileBounds[T](percentile: Double, k: Int)(implicit num: Numeric[T]): QTreeAggregator[T] =
146+
QTreeAggregator[T](percentile, k)
123147
}
124148

125149
/**

algebird-core/src/main/scala/com/twitter/algebird/Approximate.scala

+2
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ case class ApproximateBoolean(isTrue: Boolean, withProb: Double) {
2222

2323
def not: ApproximateBoolean = ApproximateBoolean(!isTrue, withProb)
2424

25+
def unary_! : ApproximateBoolean = not
26+
2527
def ^(that: ApproximateBoolean): ApproximateBoolean = {
2628
// This is true with probability > withProb * that.withProb
2729
// The answer is also correct if both are wrong, which is

algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala

+5-1
Original file line numberDiff line numberDiff line change
@@ -1207,4 +1207,8 @@ object CMSHasherImplicits {
12071207
override def hash(a: Int, b: Int, width: Int)(x: Bytes): Int = hashBytes(a, b, width)(x.array)
12081208
}
12091209

1210-
}
1210+
implicit object CMSHasherByteArray extends CMSHasher[Array[Byte]] {
1211+
override def hash(a: Int, b: Int, width: Int)(x: Array[Byte]): Int = hashBytes(a, b, width)(x)
1212+
}
1213+
1214+
}

algebird-core/src/main/scala/com/twitter/algebird/DecayedValue.scala

+3-3
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,14 @@ package com.twitter.algebird
2424

2525
object DecayedValue extends java.io.Serializable {
2626
def build[V <% Double](value: V, time: Double, halfLife: Double) = {
27-
DecayedValue(value, time * scala.math.log(2.0) / halfLife)
27+
DecayedValue(value, time * math.log(2.0) / halfLife)
2828
}
2929
val zero = DecayedValue(0.0, Double.NegativeInfinity)
3030

3131
def scale(newv: DecayedValue, oldv: DecayedValue, eps: Double) = {
3232
val newValue = newv.value +
33-
scala.math.exp(oldv.scaledTime - newv.scaledTime) * oldv.value
34-
if (scala.math.abs(newValue) > eps) {
33+
math.exp(oldv.scaledTime - newv.scaledTime) * oldv.value
34+
if (math.abs(newValue) > eps) {
3535
DecayedValue(newValue, newv.scaledTime)
3636
} else {
3737
zero

0 commit comments

Comments
 (0)