-
Notifications
You must be signed in to change notification settings - Fork 346
Heavyhitters no longer attempts lazy storage in SketchMap #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…to actually be a lazy val
/** | ||
* Calculates the frequencies for every heavy hitter. | ||
*/ | ||
lazy val heavyHittersMapping: Map[K, V] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will val
be better in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since if its not needed we would want to skip calculating it, i.e. in your code that makes these sketchmap you never need to generate this.
Looks good to me. Thanks for doing this! |
It seems we now would have an awkward tie back to SketchMapParams -- why not just make it a full reference to the params? I'm guessing there was a good reason for breaking this connection earlier? If nothing else we may want to make the frequencyCalculator private (by putting it in its own ()). |
I'm not sure why it was done, probably trying to separate most of the logic from the data store. So I was trying to stick within this. Though making it private is definitely a good call. |
I agree we should pass params directly. The closure you are passing is keeping a ref to the params already, let's be direct. The reason earlier was just to make the object smaller for serialization, which is still an issue here when it comes to Kryo serialization. The alternative here is to keep a small cache in the Monoid of the mapping we are tying to the instance here. Even a simple private lazy transient java hashmap with LRU behavior might be the win: no binary change, but get all the perf wins. |
We then need to worry about threading access and having a concurrent This field is transient for serialization, though the params can be such On Thu, Apr 17, 2014 at 10:33 AM, P. Oscar Boykin
|
This has been updated to just using the params, it seems the more popular approach. (I don't really mind which). |
…ched usages too. Users can access heavy hitters as a one time call to heavyHitters with no caching, or request the frequencyWithHHCache if they expect to do the lookups often
I'm fine with this. Generally it seems there's minimal benefit of using the computed heavyHitters mapping unless you'll be looking up heavyHitters multiple times. Though if we're not adding methods to SketchMap, we shouldn't need the valueOrdering either? |
Sorry, yes my bad, should have taken that back out too. If your looking them up multiple times you can call the frequencyWithHHCache function, which gives you a view of the frequency table with the HH's in a cache. |
@@ -149,17 +145,11 @@ case class SketchMapParams[K](seed: Int, width: Int, depth: Int, heavyHittersCou | |||
} | |||
|
|||
/** | |||
* Calculates the frequencies for every heavy hitter. | |||
*/ | |||
def calculateHeavyHittersMapping[V:Ordering](keys: Iterable[K], table: AdaptiveMatrix[V]): Map[K, V] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't delete this, we are still binary compatible, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other previous changes to sketch map make this change already backwards
incompatible
On Thursday, April 17, 2014, P. Oscar Boykin [email protected]
wrote:
In algebird-core/src/main/scala/com/twitter/algebird/SketchMap.scala:
@@ -149,17 +145,11 @@ case class SketchMapParams[K](seed: Int, width: Int, depth: Int, heavyHittersCou
}/**
- * Calculates the frequencies for every heavy hitter.
- */
- def calculateHeavyHittersMapping[V:Ordering](keys: Iterable[K], table: AdaptiveMatrix[V]): Map[K, V] =
if we don't delete this, we are still binary compatible, right?
Reply to this email directly or view it on GitHubhttps://github.com//pull/305/files#r11760808
.
Keeping the notion of lazily calculated heavy hitters, but fixing it to ...
...actually be a lazy val
I supply a partially applied function into the SketchMap class so it can satisfy the heavy hitters when required on first usage.
Fixes #304