-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
After adding UltraLogLog support to Apache Pinot I've been looking at adding some of the MinHash variants, but to do this I need a reliable way to merge them together when running SQL queries, or merging rows.
Solution
I'd like the SimilarityHasher interface to also have a merge method that takes two byte[] and returns a byte[] that represents the merged state.
Alternatives
- I've tried implementing the merge functions myself, and run into problems like MinHash output is not mergeable with hash sizes under 64 #169
- I did consider a half way solution of just streaming hashes into it, but that's also not available in the current interface
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request