Skip to content

Heavyhitters no longer attempts lazy storage in SketchMap #305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 18, 2014

Conversation

ianoc
Copy link
Collaborator

@ianoc ianoc commented Apr 16, 2014

...actually be a lazy val

I supply a partially applied function into the SketchMap class so it can satisfy the heavy hitters when required on first usage.

Fixes #304

/**
* Calculates the frequencies for every heavy hitter.
*/
lazy val heavyHittersMapping: Map[K, V] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will val be better in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since if its not needed we would want to skip calculating it, i.e. in your code that makes these sketchmap you never need to generate this.

@tanin47
Copy link
Contributor

tanin47 commented Apr 16, 2014

Looks good to me. Thanks for doing this!

@jnievelt
Copy link
Contributor

It seems we now would have an awkward tie back to SketchMapParams -- why not just make it a full reference to the params? I'm guessing there was a good reason for breaking this connection earlier?

If nothing else we may want to make the frequencyCalculator private (by putting it in its own ()).

@ianoc
Copy link
Collaborator Author

ianoc commented Apr 16, 2014

I'm not sure why it was done, probably trying to separate most of the logic from the data store. So I was trying to stick within this.

Though making it private is definitely a good call.

@johnynek
Copy link
Collaborator

I agree we should pass params directly. The closure you are passing is keeping a ref to the params already, let's be direct.

The reason earlier was just to make the object smaller for serialization, which is still an issue here when it comes to Kryo serialization.

The alternative here is to keep a small cache in the Monoid of the mapping we are tying to the instance here. Even a simple private lazy transient java hashmap with LRU behavior might be the win: no binary change, but get all the perf wins.

@ianoc
Copy link
Collaborator Author

ianoc commented Apr 17, 2014

We then need to worry about threading access and having a concurrent
hashmap, seems like its adding a lot of complexity to try store it in the
monoid.

This field is transient for serialization, though the params can be such
also. Neither approach sounds particularly great

On Thu, Apr 17, 2014 at 10:33 AM, P. Oscar Boykin
[email protected]:

I agree we should pass params directly. The closure you are passing is
keeping a ref to the params already, let's be direct.

The reason earlier was just to make the object smaller for serialization,
which is still an issue here when it comes to Kryo serialization.

The alternative here is to keep a small cache in the Monoid of the mapping
we are tying to the instance here. Even a simple private lazy transient
java hashmap with LRU behavior might be the win: no binary change, but get
all the perf wins.

Reply to this email directly or view it on GitHubhttps://github.com//pull/305#issuecomment-40740513
.

@ianoc
Copy link
Collaborator Author

ianoc commented Apr 17, 2014

This has been updated to just using the params, it seems the more popular approach. (I don't really mind which).

…ched usages too. Users can access heavy hitters as a one time call to heavyHitters with no caching, or request the frequencyWithHHCache if they expect to do the lookups often
@jnievelt
Copy link
Contributor

I'm fine with this. Generally it seems there's minimal benefit of using the computed heavyHitters mapping unless you'll be looking up heavyHitters multiple times.

Though if we're not adding methods to SketchMap, we shouldn't need the valueOrdering either?

@ianoc
Copy link
Collaborator Author

ianoc commented Apr 17, 2014

Sorry, yes my bad, should have taken that back out too.

If your looking them up multiple times you can call the frequencyWithHHCache function, which gives you a view of the frequency table with the HH's in a cache.

@@ -149,17 +145,11 @@ case class SketchMapParams[K](seed: Int, width: Int, depth: Int, heavyHittersCou
}

/**
* Calculates the frequencies for every heavy hitter.
*/
def calculateHeavyHittersMapping[V:Ordering](keys: Iterable[K], table: AdaptiveMatrix[V]): Map[K, V] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't delete this, we are still binary compatible, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other previous changes to sketch map make this change already backwards
incompatible

On Thursday, April 17, 2014, P. Oscar Boykin [email protected]
wrote:

In algebird-core/src/main/scala/com/twitter/algebird/SketchMap.scala:

@@ -149,17 +145,11 @@ case class SketchMapParams[K](seed: Int, width: Int, depth: Int, heavyHittersCou
}

/**

  • * Calculates the frequencies for every heavy hitter.
  • */
  • def calculateHeavyHittersMapping[V:Ordering](keys: Iterable[K], table: AdaptiveMatrix[V]): Map[K, V] =

if we don't delete this, we are still binary compatible, right?

Reply to this email directly or view it on GitHubhttps://github.com//pull/305/files#r11760808
.

johnynek added a commit that referenced this pull request Apr 18, 2014
Keeping the notion of lazily calculated heavy hitters, but fixing it to ...
@johnynek johnynek merged commit eeada09 into develop Apr 18, 2014
@johnynek johnynek deleted the feature/sketchmapReadPerformanceFix branch April 18, 2014 00:56
@ianoc ianoc changed the title Keeping the notion of lazily calculated heavy hitters, but fixing it to ... Heavyhitters no longer attempts lazy storage in SketchMap Apr 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

heavyHittersMapping is re-computed everytime heavyHitters() or frequency() is called
4 participants