-
Notifications
You must be signed in to change notification settings - Fork 346
Coverage and documentation for AveragedValue #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -17,17 +17,23 @@ limitations under the License. | |||
package com.twitter.algebird | |||
|
|||
object AveragedValue { | |||
// TODO: Change to Group[AveragedValue] in 0.13.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an issue and a link to the issue. In my experience TODOs in code are generally ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I added it to the 0.13.0 cleanup - I'll add the link here.
if (iter.isEmpty) None | ||
else { | ||
val (c, v) = iter.foldLeft((0L, 0.0)) { | ||
case ((count, sum), AveragedValue(c, v)) => (count + c, sum + (c * v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the stability of the algorithm doesn't it? We go to lengths to not just do the simple thing in plus and here we go back to the simple thing. I don't know if we can speed up sumOption without duplicating code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's probably right. Probably worth removing...I guess we have to allocate a Tuple2
anyway even if we abstract out the code. We might as well just create the AveragedValue
directly.
so kill this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's what I meant about copy/paste. We could do a while loop that mutates a double and Long. That would be no allocation, but would duplicate code unless we pull the shared code out into a method that accepts Longs and Doubles and returns a double.
Might be worth doing. We could see what benchmarks say. I bet you could get 2x doing that.
@@ -8,8 +8,16 @@ scaladoc: "#com.twitter.algebird.AveragedValue" | |||
|
|||
# Averaged Value | |||
|
|||
### Documentation Help | |||
## Algebraic Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention it is a Group meaning you can subtract values?
### Documentation Help | ||
## Algebraic Properties | ||
|
||
## Usage Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like just showing some example of getting the average of a bunch of Longs or Doubles would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also showing the Aggregator.
@@ -8,8 +8,16 @@ scaladoc: "#com.twitter.algebird.AveragedValue" | |||
|
|||
# Averaged Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention the two main facts: 1) allows you to get the average and the count with one pass over the data. 2) tracks the average using a stable algorithm that won't overflow for big data (as long as there are fewer than 2^63 items generally).
c9bb49d
to
5bf04ca
Compare
Current coverage is 81.03% (diff: 97.36%)@@ develop #589 diff @@
==========================================
Files 113 113
Lines 4903 4904 +1
Methods 4517 4519 +2
Messages 0 0
Branches 385 384 -1
==========================================
+ Hits 3962 3974 +12
+ Misses 941 930 -11
Partials 0 0
|
1420c84
to
ae78aad
Compare
ae78aad
to
daa8a3e
Compare
* @return an instance with `r`'s stream subtracted out | ||
*/ | ||
def -(r: AveragedValue): AveragedValue = AveragedGroup.minus(this, r) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we are at it, should we add:
def +(that: Double): AveragedValue = AveragedValue(count + 1L, ....)`
def +[N](that: N)(implicit num: Numeric[N]): AveragedValue = this + (num.toDouble(that))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
👍 So great! Epic PR. benchmarks, docs, perf, nice methods! |
* Returns this instance, always. | ||
* | ||
* @param r ignored instance of `First[U]` | ||
*/ | ||
def +[U >: T](r: First[U]): First[U] = this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could actually make this return First[T]
, I think.
This PR increases test coverage and adds a documentation page for
AveragedValue
. It also adds asumOption
implementation. (This comes after #591, to take advantage of the benchmarking goodies.)Run the benchmark with:
Here are the results:
So, about a 2.3x speedup from the optimized
sumOption
impl.