group for calculating Pearson's Correlation Coefficient #829

eigenvariable · 2020-06-05T16:22:36Z

Analogous to the Moments group, but for covariance and Pearson's correlation coefficient instead.

In principle, this could probably be extended to Coskewness and Cokurtosis; I don't have a reference for that, so in a sense it would be original research, and hence isn't part of this PR. Another direction for future work would be to add weights and/or exponential decay to the calculation of correlation, both of which are covered in https://dl.acm.org/doi/10.1145/3221269.3223036.

There was a delicate balance between reusing business logic from Moments and re-implementing it. I'm especially open to feedback on re-working some of that.

CLAassistant · 2020-06-05T16:22:41Z

All committers have signed the CLA.

eigenvariable · 2020-06-05T16:24:19Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+ *
+ */
+case class Correlation(
+    c2: Double,


Unlike Moments, I listed the arguments here in descending order of degree. My reasoning was that c2 is the most important quantity in this calculation, and should therefore be listed first, but I'm open to other opinions.

eigenvariable · 2020-06-05T16:24:57Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+   */
+  override def plus(a: Correlation, b: Correlation): Correlation = {
+    val count = a.count + b.count
+    if (count == 0)


As with Moments, this is necessary to make this into a Group instead of just a Monoid. It seems a tiny bit jankny to me, but hey, negation is a nice property so why not?

It looks like I wrote this piece of jank originally in Moments. As you two to have figured out here, it certainly seems to not be associative. I can submit a PR to remove it from Moments if you like.

Wait, @snoble, do you get notified for all algebird PRs or did you just happen to stumble across this?

Actually, it looks like it's necessary even for it to be a Monoid instead of a Semigroup (otherwise zero + zero ends up with NaN in it). By the time I pushed this review, I'd forgotten which property it was needed for.

eigenvariable · 2020-06-05T16:29:41Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+      val deltaLeft = b.m1Left - a.m1Left
+      val deltaRight = b.m1Right - a.m1Right
+
+      val m2Left = a.m2Left + b.m2Left + math.pow(deltaLeft, 2) * prodSumRatio


The calculation of the first central moment in Moments was nuanced enough that there's a specific method for it, so I reused it above. However, the calculation of the second central moment is an expression with a lot of variables, each of which is used once, so I didn't refactor it into its own method for the purposes of DRY. I think that was the right call but am open to feedback.

I also could have defined m1Left, m2Left, m1Right and m2Right in terms of the private methods leftMoment and rightMoment. However, I noticed that the calculation factored out into prodSumRatio was used in several places here, and that expanding these calculations out reduces the number of arithmetic operations. Let me know if you don't think the tradeoff is worth it.

eigenvariable · 2020-06-05T16:31:10Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+ * A class to calculate covariance and the first two central moments of a sequence of pairs of Doubles, from which the
+ * pearson correlation coeifficient can be calculated.
+ *
+ * m{i}Left denotes the ith central moment of the first projection of the pair.


Left and Right don't seem like great naming conventions for the first and second elements of a pair, but when you start having first and second moments of each of these, using numbers gets confusing fast. Any suggestions for improvements in naming convention would be welcome.

Maybe instead of left and right I could refer to them as x and y (like the coordinates of the plane). Your thoughts, @johnynek?

yeah, I like that. I'd be +1 on using x and y (and if we did emit a m, b for a linear fit, such that y = m x + b).

Done. I ended up going with m1x instead of m1X because it seemed easier to read and type, but did go with the technicall correct caml-casing with things like meanX. Happy to change conventions if you have a preference.

eigenvariable · 2020-06-05T16:32:31Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+ */
+case class Correlation(
+    c2: Double,
+    m2Left: Double,


At some point, I experimented with having two full copies of Moments for each element here, but the fact that the 0th moment (count) has to be the same in both cases made that representation harder to work with (and also less space efficient).

johnynek

I'm really sorry I let this sit so long. It's very interesting and I actually have a project where I expect I will be using this very soon.

This is really cool (especially if we make count a Double so we can decay it and have decayed correlation, so we don't have to have infinite memory).

johnynek · 2020-07-02T02:09:57Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+    m2Right: Double,
+    m1Left: Double,
+    m1Right: Double,
+    m0: Long


Can we make this a Double actually? If we make this a double, we can actually make Correlation a VectorSpace which allows us to do decayed values on Correlation (which is a pretty cool idea)....

I've been using decayed 2nd moments in a project and it seems to be a useful feature transformation since it can give you decayed z-scores.

Taking a closer look at section 3.1 of the paper I linked to in the PR description, it looks like the calculation for weights really is as simple as making this a Double (the choice of notation in the two papers is annoyingly different though). Given how easy this is, it seems like a no-brainer to implement. Good call-out here.

Section 4.1 talks about weighted correlation being used as a metric (seems like an easy enough method to add) and 4.2 exponentially weighted moving correlation (which seems like something that should live outside of this particular abstraction--do you agree?).

johnynek · 2020-07-02T02:11:56Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+
+  def meanRight: Double = m1Right
+
+  // useful in calculating variance and stddev; private because we don't have skewness or kurtosis info


I'd rather move this code to the Moment object so we don't have to allocate a Moments class to get the result. The JIT may be able to allocate on the stack using escape analysis, but I'm nervous.... but if you don't have time, I understand (that said, my request to use a Double for the m0 may mean we can't use this method).

@johnynek I agree re:weights, and it's not very deep business logic, so I that re-implementing it here is no big deal. However, what should be the equivalent of this line? Should > 1 still be the threshold in the presence of weights?

algebird/algebird-core/src/main/scala/com/twitter/algebird/MomentsGroup.scala

Line 35 in cf913f6

if (count > 1)

no, I think it should probably be count != 0.0 I suppose. But even then, maybe we should just remove the branch and always return the value and let the caller be careful what it means if count <= 0.0?

I opted for the latter, but am open to changing it again.

johnynek · 2020-07-02T02:14:41Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+
+}
+
+object CorrelationAggregator extends MonoidAggregator[(Double, Double), Correlation, Correlation] {


seems like you might want the .andThenPresent(_.correlation) a lot.

Agreed, but sometimes you might want .andThenPresent(_.covariance) from time-to-time as well. Maybe I can add a convenience method to this object?

good point. Yeah, two convenience methods on this would be good. def correlation and def covariance would be good.

johnynek · 2020-07-02T02:22:39Z

algebird-test/src/main/scala/com/twitter/algebird/scalacheck/Gen.scala

@@ -69,4 +69,20 @@ object gen extends ExpHistGen with IntervalGen {
  def genSSManySpaceSaver: Gen[SpaceSaver[String]] =
    Gen.nonEmptyListOf(genFixedSSOneSpaceSaver).flatMap(l => l.reduce(_ ++ _))

+  def genCorrelation: Gen[Correlation] =


this never generates 0, right? (a count of zero)? A way that I like that might actually be simpler is to generate according to how this is used:

lazy val genCorrelation: Gen[Correlation] = { val recur = Gen.lzy(genCorrelation) // we can start with any pair of numbers: val genClose: Gen[Correlation] = for { x <- choose(-1000, 1000) delta <- choose(-100.0, 100.0) } yield Correlation((x, x + delta)) val genUncorr: Gen[Correlation] = for { x <- choose(-1e20, 1e20) y <- choose(-1e20, 1e20) } yield Correlation((x, y)) val genSum = Gen.zip(recur, recur).map { case (a, b) => a + b } val genDiff = Gen.zip(recur, recur).map { case (a, b) => a - b } val genNeg = recur.map(_.negate) // now return with a low probability of choosing the branching cases: Gen.frequency( (5, genClose), (5, genUncorr), (2, genNeg), (1, Correlation.zero), (1, genSum), (1, genDiff)) }

In this style, you are generating values that are built up from formulas of using the data.

That's true, although it does get verified when validating Group laws. Although my guess is that you prefer this for tests written by consumers of this library, right?

I ended up needing to make exponent of the range of Doubles be 10 instead of 20. Otherwise, it was too easy for variance to become Infinity which often ends up making c2 be NaN, which in turn makes the resulting Correlation instance unequal to itself.

johnynek · 2020-07-02T02:30:42Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+  def stddevRight: Double = rightMoment.stddev
+
+  def covariance: Double = c2 / count
+


nice to have:

we could also include the formula for a least squares fit right? So, fit left = m * right + b and return (m, b)? I think the formula is:

m = (c2 - c0 * meanLeft * meanRight) / m2Right and b = meanLeft - m * meanRight or something...

Nice! I hadn't realized you could do linear least squares regression with just this info, although it makes sense. The formula for m turned out to be slightly different, but I was able to verify the correctness of the implementation with a nice scalacheck test.

johnynek · 2020-07-02T02:31:47Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+    // however, dividing by "count" cancels out, and leaves us with the following formula, which relies on fewer
+    // divisions
+    c2 / (Math.sqrt(m2Left * m2Right))
+}


can we add def swap: Correlation which swaps the left with the right? So we expect x.swap.swap == x and Correlation(a, b).swap == Correlation(b, a) etc...

johnynek · 2020-07-02T02:33:23Z

algebird-test/src/test/scala/com/twitter/algebird/CorrelationLaws.scala

+
+  val testList = Range.inclusive(-10, 10).map(_.toDouble).toList
+
+  "correlation with y = x should be 1" in {


I think this is also true for y = m * x for all m > 0.0 right? similarly below?

It is (for any y = mx + b, with m != 0, as it turns out). I turned this into a scalacheck test, although I needed to restrict the choices of m and b to Ints, otherwise scalacheck was too good at finding examples where you end up losing a fair bit of precision (that said, I also conducted that experiment before restricting the size of the generated Doubles to be 1e20, which may have helped).

johnynek · 2020-07-02T02:35:16Z

Also, if you don't have the interest or time in making these changes, just let me know, I can merge it and implement them. I don't want to block great work just because I want to add a few things.

eigenvariable · 2020-07-06T20:21:53Z

I think I addressed all of the review feedback (if not, this was an unintentional oversight).

PTAL @johnynek
(at your convenience)

eigenvariable · 2020-07-06T21:57:53Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+   * This differs from the implementation in MomentsGroup.scala only in that here, the counts are weighted, and are
+   * thus doubles instead of longs
+   */
+  def getCombinedMean(weightN: Double, an: Double, weightK: Double, ak: Double): Double =


I don't love that this is copy-pasta. We could in principle refactor the method in Moments to take a Numeric or something like that, but I'm not sure if that would force you to bump a semantic version of Algebird, so I opted for a little bit of copying. Happy to take another approach if you prefer, @johnynek.

johnynek · 2020-07-06T22:22:46Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+    val b = meanRight - m * meanLeft
+    (m, b)
+  }
+


now that we have m0 as a Double, can we add def scale(z: Double): Correlation which scales the entire correlation (which is used for making exponential decays)?

I think the mean isn't touched but the totalWeight and the second moments are changed.

The laws we want: x.scale(0.0) == Monoid.zero, x.scale(a).scale(b) == x.scale(a * b), x.scale(1.0) == x. I think those laws are enough...

(obviously with the correct approximately equals situation).

The linked paper seems to agree with you, but those laws are also satisfied by either a) making scale a no-op or b) only scaling m0 (which it took me a while to convince myself was the wrong thing). Should there also be an invariant about the average-like quantities (meanX, meanY, varianceX, varianceY) being unchanged after scaling? That would at least rule out option b.

I think no-op scale fails x.scale(0.0) == zero. So, I don't think it can be a no-op.

Ok, that works. Just pushed.

johnynek

This is so great! I can't wait to use it.

Added just one more request (about scale) if you have time.

johnynek · 2020-07-06T22:24:06Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+    if (weightN < weightK) getCombinedMean(weightK, ak, weightN, an)
+    else
+      (weightN + weightK) match {
+        case 0L                              => 0.0


shouldn't this be 0.0 not 0L?

Good catch. This was lazy copy-pasting on my part.

johnynek · 2020-07-06T22:24:47Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

+        case 0L                              => 0.0
+        case newCount if newCount == weightN => an
+        case newCount =>
+          val scaling = weightK.toDouble / newCount


isn't weightK already a Double so no need .toDouble here?

Good catch. This was lazy copy-pasting on my part.

eigenvariable · 2020-07-06T23:19:27Z

It's worth noting that the GroupLaws tests are flaky for me locally now that I've updated the Generator to be more realistic. I think it doesn't help that Doubles don't really satisfy those laws.

Also, I added distanceMetric, since that was mentioned in Numerically stable parallel computation of (Co-)Variance and was trivial to implement. I'm not sure what tests it would need. I'm also happy to leave it out if you think it's too obsucre.

PTAL @johnynek

johnynek

This is great! I'll merge this when CI is green.

Thanks for pushing this all the way!

eigenvariable · 2020-07-06T23:47:15Z

It was my pleasure! Thanks for the thorough and patient review.

eigenvariable · 2020-07-07T17:34:14Z

CI is failing because the group laws are, as I said, flaky. I tried changing the generator back to what it was before locally, and am now running into an error that doesn't appear to be a flake (TBH I'm pretty surprised that the current version of the generator doesn't uncover it). Basically, if you have an ordinary Correlation instance and scale it by a negative number, the sign of the value associated with .correlation flips (right now the assertion is written for it to be invariant). This yields two questions:

Is the bug with the implementation of scale (i.e. should c2 be scaled by math.abs(z)?) or with the invariant in the test (i.e. should the sign of .correlation be expected to flip when scaling by a negative value)? I've verified that both versions of this fix pass.
What should we do about the generator?
a. Getting it to find the above bug is easy: just add something like genRandom = // original implementation of genCorrelation as one of the arguments to Gen.frequency.
b. Getting it to not flake on the group laws seems pretty difficult here. Even when I drop the EPS constant to 0.1 I still get failures like this:

  Falsified after 3 successful property evaluations.
  Location: (CheckProperties.scala:12)
  Occurred when passed generated values (
    arg0 = Correlation(3.4781449464340951E18,-1.2058550120859232E19,-1.00322942551780416E18,3.147562977424351E9,-1.5649316090685885E9,-6.903287220248554E9),
    arg1 = Correlation(0.0,0.0,0.0,-3.109131087800439E9,-2.583738651855649E9,1.0),
    arg2 = Correlation(-0.0,-0.0,-0.0,-786.0,-853.3705897627212,-1.0)
  )

  Label of failing property:
    isAssociativeEq

where the two sides of associativity are

Correlation(3.4781449464340951E18,-1.2058550120859232E19,-1.00322942551780416E18,3.1475629774243507E9,-1.5649316090685885E9,-6.903287220248554E9)
Correlation(1.4778228264716098E19,1.7180512861538128E19,-2.4142699042477665E18,3.147562977874735E9,-1.5649316086943119E9,-6.903287220248554E9)

It's plausible that the code I wrote is somehow leaking numerical precision in a way that I haven't realized. It's also plausible to me that the magnitude of numerical precision losses get magnified as you go up the chain of moments, and that making weights be arbitrary Doubles doesn't help the situation.

Edit: the fact that there's a sign flip for m2x here makes me think something deeper is going on.

Edit edit: I see something similar going on with Moments:

val m0 = Moments(-6903287220,3.147562977424351E9,-1.2058550120859232E19,0.0,0.0)
val m1 = Moments(1,-3.109131087800439E9,0.0,0.0,0.0)
val m2 = Moments(-1,-786.0,-0.0,0.0,0.0)

scala> val left = semi.plus(m0, semi.plus(m1, m2))                                
left: com.twitter.algebird.Moments = Moments(-6903287220,3.1475629774243507E9,-1.2058550120859232E19,0.0,0.0)

scala> val right = semi.plus(semi.plus(m0, m1), m2)
right: com.twitter.algebird.Moments = Moments(-6903287220,3.147562977874735E9,1.7180512861538128E19,-2.1374251588738178E29,1.4342748170710145E39)

It's worth noting that my original implementation of the generator for Correlation was cargo-culted from the one for Moments, meaning that the Moments tests never test with negated elements, things summed together, etc. I think it's plausible that neither one of these is actually a group.

johnynek · 2020-07-07T19:26:00Z

@eigenvariable why don't we lower EPS=1e-5 or something, and then generate doubles in the range of like 2e9, 2e-9, rather than go above, and see if that works.

The example you have above looks pretty bad though...

I agree that something seems off here... I'm confused how left in your example wound up with 0 skew and 0 kurtosis.... I bet these is some bug with negative counts that was never tested.

johnynek · 2020-07-07T19:31:55Z

okay, I think the issue is if count0 + count1 == 0 we destroy the other values by returning zero....

So, that is breaking associativity I think.

It seems that rule has to go... what if you remove that, what laws break then?

Oh, and I thought of more laws we need on scale:

(x + y).scale(z) == (x.scale(z) + y.scale(z))
x.scale(y + z) == (x.scale(y) + x.scale(z))

Thanks for your careful testing that uncovered this issue!

johnynek · 2020-07-07T19:36:31Z

actually... I'm not so sure anymore.... I guess we need to take the limit of delta -> 0 with (count + delta) - count and see what the values are... I actually don't see why the rule of count == 0 implies zero is wrong on second glance.

johnynek · 2020-07-07T19:44:17Z

one question:

if we consider two moments the same if they both have count == 0 OR all the approximately equal on all the fields, does that solve it? Is this just due to having bad limits as count goes to zero and forcing us to return zero which breaks associativity?

eigenvariable · 2020-07-07T19:49:18Z

IIRC, setting everything to 0 came from some fields ending up being NaN under certain circumstances, which breaks reflexivity. I updated the code to just compare fields with NaN in them as equal, and, combined with dropping the count == 0 comparison seems to be working, even with decently strict numerical precision. I'll push the commit shortly.

eigenvariable · 2020-07-07T19:59:55Z

Ok, I've pushed a commit that works, where there's now an assertion that scaling by a negative factor will have a negative effect on correlation. I'm still not sure if that's the right fix.

Also, I still need to implement the invariants WRT addition. Doing that now.

codecov-commenter · 2020-07-07T20:05:21Z

Codecov Report

Merging #829 into develop will increase coverage by 0.02%.
The diff coverage is 92.75%.

@@             Coverage Diff             @@
##           develop     #829      +/-   ##
===========================================
+ Coverage    89.41%   89.44%   +0.02%     
===========================================
  Files          116      117       +1     
  Lines         9164     9233      +69     
  Branches       383      353      -30     
===========================================
+ Hits          8194     8258      +64     
- Misses         970      975       +5

Impacted Files	Coverage Δ
...scala/com/twitter/algebird/CorrelationMonoid.scala	`89.13% <89.13%> (ø)`
...in/scala/com/twitter/algebird/BaseProperties.scala	`97.67% <100.00%> (+0.05%)`	⬆️
...la/com/twitter/algebird/scalacheck/Arbitrary.scala	`100.00% <100.00%> (ø)`
...in/scala/com/twitter/algebird/scalacheck/Gen.scala	`95.34% <100.00%> (-4.66%)`	⬇️
.../main/scala/com/twitter/algebird/HyperLogLog.scala	`92.36% <0.00%> (-0.41%)`	⬇️
...src/main/scala/com/twitter/algebird/Interval.scala	`80.00% <0.00%> (ø)`
.../main/scala/com/twitter/algebird/BloomFilter.scala	`93.36% <0.00%> (ø)`
...main/scala/com/twitter/algebird/MomentsGroup.scala	`93.02% <0.00%> (+2.32%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4543b76...fc26a5d. Read the comment docs.

eigenvariable · 2020-07-07T20:05:25Z

Ooh, one of them ((x + y).scale(z) == (x.scale(z) + y.scale(z))) fails :

GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
 (CheckProperties.scala:12)
  Falsified after 0 successful property evaluations.
  Location: (CheckProperties.scala:12)
  Occurred when passed generated values (
    arg0 = Correlation(Infinity,Infinity,Infinity,6.353346554766811E9,-3.987013780614332E9,1.0),
    arg1 = 1, // 23 shrinks
    arg2 = -1 // 31 shrinks
  )

I don't think the Infinitys are the issue here, I think it's that arg1 + arg2 == 0. Scaling ends up multiplying everything but the means by 0, zeroing them out, but no longer forcing a count of 0 to zero out the other moments during addition breaks this property.

Edit: the other addition property fails with scaling by 0

johnynek · 2020-07-07T20:15:57Z

algebird-core/src/main/scala/com/twitter/algebird/CorrelationGroup.scala

-
-      Correlation(c2 = c2, m2x = m2x, m2y = m2y, m1x = m1x, m1y = m1y, m0 = count)
-    }
+    val prodSumRatio = a.totalWeight * b.totalWeight / count


so this is xy/(x+y). if x == y, this should be x/2, so if x = y = 0 seems like this should be zero.

if y -> 0, but x != 0, we should have 0 also.

finally, if x = -y but both > 0, then we should have Infinity here, right?

eigenvariable · 2020-07-07T22:58:26Z

OK, @johnynek and I talked offline about this, and we couldn't find a fix to make this into a group instead of just a monoid, so for now it remains just a monoid. As it turns out, I'd forgotten that the zero check in plus (and also scale) was necessary to get a Monoid (and not a Group), so I put that back in just in case people really want to have a zero.

AFAICT, everything passes now, but it's probably worth taking another look at this to make sure that none of the changes we made while exploring the above inadvertently ended up getting committed.

johnynek · 2020-07-09T02:28:02Z

this is green except for formatting:

/home/travis/build/twitter/algebird/algebird-core/src/main/scala/com/twitter/algebird/CorrelationMonoid.scala isn't formatted properly!

Can you rerun scalafmtAll?

eigenvariable · 2020-07-09T03:56:53Z

I ran scalafmtAll for the most recent commit, and running it again locally is a no-op. Moreover, running

 sbt \ "; scalafmtCheckAll; scalafmtSbtCheck" \ "; scalafixEnable; scalafix --check; test:scalafix --check"

(which seems to be the command that is failing in CI) succeeds for me locally. Any idea what the problem could be?

johnynek · 2020-07-16T03:15:23Z

okay, here is the deal, since you made the PR, develop has moved on and got a later version of scalafmt. That version wants this diff:

diff --git a/algebird-core/src/main/scala/com/twitter/algebird/CorrelationMonoid.scala b/algebird-core/src/main/scala/com/twitter/algebird/CorrelationMonoid.scala
index daa4dc0f..fb2eae56 100644
--- a/algebird-core/src/main/scala/com/twitter/algebird/CorrelationMonoid.scala
+++ b/algebird-core/src/main/scala/com/twitter/algebird/CorrelationMonoid.scala
@@ -49,7 +49,6 @@ object Correlation {
  * m{i}x denotes the ith central moment of the first projection of the pair.
  * m{i}y denotes the ith central moment of the second projection of the pair.
  * c2 the covariance equivalent of the second central moment, i.e. c2 = Sum_(x,y) (x - m1x)*(y - m1y).
- *
  */
 case class Correlation(c2: Double, m2x: Double, m2y: Double, m1x: Double, m1y: Double, m0: Double) {
   def totalWeight: Double = m0
@@ -71,7 +70,6 @@ case class Correlation(c2: Double, m2x: Double, m2y: Double, m1x: Double, m1y: D
   def covariance: Double = c2 / totalWeight

   /**
-   *
    * @return Pearson's correlation coefficient
    */
   def correlation: Double =

So, if you do git fetch origin and git merge develop, then run scalafmtAll at the sbt prompt, or sbt scalafmtAll I think it should resolve it. Alternatively, you can manually remove the two lines above it wants removed.

… arbitrary values including those with higher moments; we now expect for negating a Correlation to negate the sign of its .correlation method

eigenvariable · 2020-07-16T04:28:12Z

I'm embarrassed that rebasing against develop didn't occur to me; I'm rusty. I just force-pushed after rebasing and rerunning scalafmtAll.

johnynek · 2020-07-16T05:15:07Z

This is great. Thank you so much.

eigenvariable commented Jun 5, 2020

View reviewed changes

johnynek reviewed Jul 2, 2020

View reviewed changes

eigenvariable force-pushed the eigenvariable_correlation branch from 848ea39 to d965537 Compare July 6, 2020 20:14

eigenvariable commented Jul 6, 2020

View reviewed changes

johnynek reviewed Jul 6, 2020

View reviewed changes

johnynek approved these changes Jul 6, 2020

View reviewed changes

johnynek reviewed Jul 7, 2020

View reviewed changes

eigenvariable added 4 commits July 15, 2020 21:25

defined Correlation case class

7b472bb

semigroup

016f703

first tests, but they fail

e590ba1

tests pass

6823ec4

eigenvariable added 16 commits July 15, 2020 21:25

ok, it's a group

05a7593

some more test stuff

fb3f89f

tests around mean, stddev + a bug fix that the test found

60ca30b

correlation tests

f01eb27

scalafmt

dca8f96

fixed up comments, deleted dead code

34bbdd9

deleted commented out line leftover from debugging

5a67ced

most of the review feedback

68d9963

new generator for Correlation

5b261ec

rename left/right to x/y

c4b945f

scale

104d351

no longer have a conditional around 0 counts, generator now generates…

dfc6136

… arbitrary values including those with higher moments; we now expect for negating a Correlation to negate the sign of its .correlation method

it's not a group after all

116c1d8

put zero check back in

3457cbc

scalfmt

3fa9b61

scalafmtAll, rebased

fc26a5d

eigenvariable force-pushed the eigenvariable_correlation branch from 8061bc5 to fc26a5d Compare July 16, 2020 04:27

johnynek merged commit e2ecb27 into twitter:develop Jul 16, 2020

This was referenced Jul 17, 2020

Remove implicit from MomentsGroup and other cleanups #844

Merged

Add scale method to Moments #850

Merged

regadas mentioned this pull request Sep 22, 2020

Setup automated releases from GitHub Actions CI #867

Merged


		def meanRight: Double = m1Right

		// useful in calculating variance and stddev; private because we don't have skewness or kurtosis info


		}

		object CorrelationAggregator extends MonoidAggregator[(Double, Double), Correlation, Correlation] {

		def stddevRight: Double = rightMoment.stddev

		def covariance: Double = c2 / count


		val testList = Range.inclusive(-10, 10).map(_.toDouble).toList

		"correlation with y = x should be 1" in {

group for calculating Pearson's Correlation Coefficient #829

group for calculating Pearson's Correlation Coefficient #829

Conversation

eigenvariable commented Jun 5, 2020

CLAassistant commented Jun 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigenvariable Jul 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek commented Jul 2, 2020

eigenvariable commented Jul 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigenvariable commented Jul 6, 2020 • edited Loading

johnynek left a comment

Choose a reason for hiding this comment

eigenvariable commented Jul 6, 2020

eigenvariable commented Jul 7, 2020 • edited Loading

johnynek commented Jul 7, 2020

johnynek commented Jul 7, 2020

johnynek commented Jul 7, 2020

johnynek commented Jul 7, 2020

eigenvariable commented Jul 7, 2020

eigenvariable commented Jul 7, 2020

codecov-commenter commented Jul 7, 2020 • edited Loading

Codecov Report

eigenvariable commented Jul 7, 2020 • edited Loading

Choose a reason for hiding this comment

eigenvariable commented Jul 7, 2020

johnynek commented Jul 9, 2020

eigenvariable commented Jul 9, 2020

johnynek commented Jul 16, 2020

eigenvariable commented Jul 16, 2020

johnynek commented Jul 16, 2020

CLAassistant commented Jun 5, 2020 •

edited

Loading

eigenvariable Jul 7, 2020 •

edited

Loading

eigenvariable commented Jul 6, 2020 •

edited

Loading

eigenvariable commented Jul 7, 2020 •

edited

Loading

codecov-commenter commented Jul 7, 2020 •

edited

Loading

eigenvariable commented Jul 7, 2020 •

edited

Loading