Skip to content

Coverage and documentation for AveragedValue #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 2, 2016

Conversation

sritchie
Copy link
Collaborator

@sritchie sritchie commented Nov 30, 2016

This PR increases test coverage and adds a documentation page for AveragedValue. It also adds a sumOption implementation. (This comes after #591, to take advantage of the benchmarking goodies.)

Run the benchmark with:

jmh:run -t1 -f 1 -wi 15 -i 20 .*AveragedValueBenchmark.*

Here are the results:

[info] Benchmark                             (numElements)   Mode  Cnt      Score      Error  Units
[info] AveragedValueBenchmark.timePlus               10000  thrpt   20   6474.320 ±   65.222  ops/s
[info] AveragedValueBenchmark.timeSumOption          10000  thrpt   15  15057.069 ±  305.026  ops/s
[success] Total time: 73 s, completed Dec 1, 2016 2:54:13 PM

So, about a 2.3x speedup from the optimized sumOption impl.

@@ -17,17 +17,23 @@ limitations under the License.
package com.twitter.algebird

object AveragedValue {
// TODO: Change to Group[AveragedValue] in 0.13.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an issue and a link to the issue. In my experience TODOs in code are generally ignored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I added it to the 0.13.0 cleanup - I'll add the link here.

if (iter.isEmpty) None
else {
val (c, v) = iter.foldLeft((0L, 0.0)) {
case ((count, sum), AveragedValue(c, v)) => (count + c, sum + (c * v))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the stability of the algorithm doesn't it? We go to lengths to not just do the simple thing in plus and here we go back to the simple thing. I don't know if we can speed up sumOption without duplicating code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's probably right. Probably worth removing...I guess we have to allocate a Tuple2 anyway even if we abstract out the code. We might as well just create the AveragedValue directly.

so kill this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's what I meant about copy/paste. We could do a while loop that mutates a double and Long. That would be no allocation, but would duplicate code unless we pull the shared code out into a method that accepts Longs and Doubles and returns a double.

Might be worth doing. We could see what benchmarks say. I bet you could get 2x doing that.

@@ -8,8 +8,16 @@ scaladoc: "#com.twitter.algebird.AveragedValue"

# Averaged Value

### Documentation Help
## Algebraic Properties
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention it is a Group meaning you can subtract values?

### Documentation Help
## Algebraic Properties

## Usage Examples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like just showing some example of getting the average of a bunch of Longs or Doubles would be fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also showing the Aggregator.

@@ -8,8 +8,16 @@ scaladoc: "#com.twitter.algebird.AveragedValue"

# Averaged Value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention the two main facts: 1) allows you to get the average and the count with one pass over the data. 2) tracks the average using a stable algorithm that won't overflow for big data (as long as there are fewer than 2^63 items generally).

@sritchie sritchie force-pushed the sritchie/averaged_tests branch from c9bb49d to 5bf04ca Compare November 30, 2016 22:32
@codecov-io
Copy link

codecov-io commented Dec 1, 2016

Current coverage is 81.03% (diff: 97.36%)

Merging #589 into develop will increase coverage by 0.22%

@@            develop       #589   diff @@
==========================================
  Files           113        113          
  Lines          4903       4904     +1   
  Methods        4517       4519     +2   
  Messages          0          0          
  Branches        385        384     -1   
==========================================
+ Hits           3962       3974    +12   
+ Misses          941        930    -11   
  Partials          0          0          

Powered by Codecov. Last update 0ff8bfe...f28a216

@sritchie sritchie added this to the 0.12.3 milestone Dec 1, 2016
@sritchie sritchie force-pushed the sritchie/averaged_tests branch 3 times, most recently from 1420c84 to ae78aad Compare December 1, 2016 21:54
@sritchie sritchie force-pushed the sritchie/averaged_tests branch from ae78aad to daa8a3e Compare December 2, 2016 03:55
* @return an instance with `r`'s stream subtracted out
*/
def -(r: AveragedValue): AveragedValue = AveragedGroup.minus(this, r)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are at it, should we add:

def +(that: Double): AveragedValue = AveragedValue(count + 1L, ....)`
def +[N](that: N)(implicit num: Numeric[N]): AveragedValue = this + (num.toDouble(that))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@sritchie sritchie changed the title [WIP] Coverage and documentation for AveragedValue Coverage and documentation for AveragedValue Dec 2, 2016
@johnynek
Copy link
Collaborator

johnynek commented Dec 2, 2016

👍 So great!

Epic PR. benchmarks, docs, perf, nice methods!

* Returns this instance, always.
*
* @param r ignored instance of `First[U]`
*/
def +[U >: T](r: First[U]): First[U] = this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could actually make this return First[T], I think.

@sritchie sritchie merged commit 0841e92 into develop Dec 2, 2016
@sritchie sritchie deleted the sritchie/averaged_tests branch December 2, 2016 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants