-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashMap With External Storage #573
Comments
So, this kind of API is something we do want to offer on |
Thank you for the extremely fast response! How would you recommend we proceed then, we could update to 0.15.0 using the raw-entry feature, but if this is something that is going to be removed this feels like it is just kicking the can down the road... Given how common hashmaps are, and how much codegen is associated with them, I'd very much like to avoid a situation where we end up stuck on an old version for a prolonged period of time, and consequently bloat our user's binaries and compile times. Perhaps you could give some indication of what the timeline for raw-entry's removal looks like, will it be gone in the next release, or in 2 years time 😅 |
@tustvold Your use case seems a lot like what
I don't think that's what they're doing -- they also insert an index associated with each slot. You need something else, because the 7 bits of the hash are insufficient. |
How does one override the Hash and Equality to use the values stored externally, as opposed to the |
The main part to look at in But you should also look through the |
Oh, that looks perfect. I wasn't aware of that new API, thank you Edit: That worked like a charm - thank you for the pointers - apache/arrow-rs#6537 |
Perhaps the deprecation note on the raw-entry feature flag could point to the HashTable API to make it more discoverable, happy to file a PR if you think this is worthwhile? E.g. changing Edit: although perhaps then people will miss entry_ref and similar... Perhaps what is really needed is a migration guide... |
Yeah, the main purpose of deprecating
|
I will start by saying that the removal of raw_entry makes a lot of sense, and forcing code to instead use some combination entry_ref and Equivalent newtypes is definitely an improvement where possible. I was not aware of these APIs and forcing me to use them has improved my code 👍. However, there is one use-case that is currently possible with raw_entry_mut, but doesn't appear to be possible with the remaining APIs.
In arrow-rs we want to intern strings, with the interned strings allocated from a contiguous buffer - see here. This precise formulation is perhaps somewhat esoteric, but I don't think it that unusual to want to maintain an index on top of some separate data structure that stores the actual records.
For example, you might have a list of records and want to provide efficient lookups based on two or more different attributes without duplicating the record data. A naive way to achieve this would be to maintain a
Vec<Record>
and then two or moreHashMap<K1, usize>
andHashMap<K2, usize>
, where theusize
value is the index in the recordsVec
of theRecord
. However, this approach duplicates the key data into each of the hashmaps, which could be prohibitively expensive.The trick / hack arrow-rs uses is to exploit the ability of raw_entry_mut to provide the hash and equality function at point of use, see here. This allows storing the data separately, without introducing self-referential borrows between the hashmap entries and the value storage.
I'm not really sure what the solution is here, and it is perfectly reasonable for this to be considered beyond the scope of this crate, but I thought I would at least articulate the use-case.
The text was updated successfully, but these errors were encountered: