Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sketches #2965

Closed
t92549 opened this issue May 26, 2023 · 1 comment · Fixed by #2966
Closed

Update sketches #2965

t92549 opened this issue May 26, 2023 · 1 comment · Fixed by #2966
Assignees
Labels
dependencies Updates/changes to Maven or other dependencies enhancement Improvement to existing functionality/feature
Milestone

Comments

@t92549
Copy link
Contributor

t92549 commented May 26, 2023

Describe the new feature you'd like
Building on from #2825, we should encourage not using the clearspring HyperLogLogPlus. The associated aggregator and serialisers should be deprecated, and we should encourage users to use datasketches' HllSketch instead. This should be done in code (as tracked by this ticket), as well as in our docs, tracked by gchq/gaffer-doc#93.

An additional complexity is that the datasketches code has moved from com.yahoo.datasketches:sketches-core to org.apache.datasketches:datasketches-java and had many version updates. We should figure out if the new module is compatible with the old module and our aggregator/serialisers. If so I suggest we just update the dependency, if not we should provide another set of classes to interact with the new apache datasketches and deprecate the yahoo ones.

Why do you want this feature?
As described in gchq/gaffer-doc#93, datasketches' HllSketch is more performant. It is also actively maintained, whereas clearspring's HyperLogLogPlus is not.

@t92549 t92549 added the enhancement Improvement to existing functionality/feature label May 26, 2023
@t92549 t92549 added this to the v2.1.0 milestone May 26, 2023
@t92549 t92549 self-assigned this May 26, 2023
@t92549
Copy link
Contributor Author

t92549 commented May 26, 2023

Once we decide how to update HllSketch, we should also update gafferpy.

@t92549 t92549 changed the title Modernise HLL algorithms Update HLL algorithms May 26, 2023
@t92549 t92549 changed the title Update HLL algorithms Update sketches May 26, 2023
t92549 added a commit that referenced this issue Jun 5, 2023
* Update sketches to apache datasketches

* Checkstyle

* Fix javadoc

* Created historic serialisation tests for sketches

* Spotless apply

* Changed examples to use HllSketch and updated tests

* Spotless

* Improve JsonDeserialisation of HllSketch

* Spotless and checkstyle

* Deprecate usage of HyperLogLogPlus

* Spotless apply

* Review comments

* Added javadoc

* Javadoc fix

* Corrected ReservoirItemsSketchSerialiser equals method
@GCHQDeveloper314 GCHQDeveloper314 added the dependencies Updates/changes to Maven or other dependencies label Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates/changes to Maven or other dependencies enhancement Improvement to existing functionality/feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants