-
Notifications
You must be signed in to change notification settings - Fork 114
Fixed range relying on lazy commit? #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So that particular configuration was added to support SGX enclaves. They don't have any notion of commit, so all the
So in the fixed range config it will calculate the space required for the pagemap, snmalloc/src/snmalloc/backend/fixedglobalconfig.h Lines 91 to 92 in ccc03ce
and then call
The rest of the supplied range will have We need the pagemap to exist for the whole range as we use it as the physical memory for the buddy allocators/red black trees that track the unused, but snmalloc owned memory.
Based on your description you probably need to modify a Pal slightly for your platform so that it calls what you need for So I would start with: In this is forwards some operations to the underlying Pal for the system: snmalloc/src/snmalloc/pal/pal_noalloc.h Lines 54 to 57 in ccc03ce
but importantly, and not what you want, it doesn't for notify_using/... :snmalloc/src/snmalloc/pal/pal_noalloc.h Lines 67 to 90 in ccc03ce
If you changed these to also call into the BasePal , then I think it would work.
If you instantiate a FixedConfig with that Pal everything "should just work" in principle. However, doing something new normally throws up issues, so please let me know how you get on.
Thanks. Writing it in C++ has helped a lot of making the configurability much cleaner than the |
Thanks for the quick reply I tried a few things in the fixed_region.cc test code, just running on normal Windows for now. One of them indeed using the DefaultPAL directly instead of PALNoAlloc, as it would indeed need to call those commits/releases.
Adding some debug info in the PAL gives me this:
The pagemap register_range seems to be shifting things out of the reserved range, causing a commit on an address that hasn't been reserved -> error. From ds\pagemap.h:
When hacking notify_using to do a RESERVE | COMMIT just to check whether it's simply a forgotten reserve, it still crashed right in the next bit of the calling code on line 106 From backend_helpers\pagemap.h
Here 'entry' is pointing to the beginning of the range, which has been reserved, but not committed. Example output
I'm not entirely sure what the 'body' really is or should be and why it would be outside of the reserved arena region. Any ideas? |
This is precisely the kind of thing I expected with my "However, doing something new normally throws up issues, so please let me know how you get on.". The pagemap register range code needs a second path for snmalloc/src/snmalloc/ds/pagemap.h Lines 62 to 64 in ccc03ce
Other code paths have something like: snmalloc/src/snmalloc/ds/pagemap.h Lines 300 to 318 in ccc03ce
I believe adding something like: if constexpr (has_bounds)
{
if ((p - base >= size) || (p + length - base >= size))
{
error("Attempting to commit outside of fixed range");
}
// Correct for index into pagemap with bounds.
p = p - base;
} should work on that function. If this fixes your issue, I'd love to receive a PR. Thanks |
Yup, that works! Amazing, thanks. |
Some follow-up questions by the way, if you have the time
Thanks |
A separate type for each heap should work. That is what I have been thinking about to do a partition alloc like feature.
Making it dynamic is within reach of the design, but not something I have done yet. We could create an allocator pool that all use the same backend range, but this would require some complex changes that I haven't got time for at the moment. But is on our work list. It would also be interesting to share the pagemap across all the different heaps, rather than having a page map for each one.
That seems reasonable. I haven't considered having many different Pals. Could you elaborate on what the differences are between each heap that makes the Pal differ?
I would like to move this to a more dynamic value that is got from the current Pal. Some platforms have variable page size that can be configured by environment variables. Ultimately, I would like that as a feature, but it hasn't been a priority, yet. |
That could be a useful saving indeed, but for my use-case at least we would need the option to keep them separate as well.
We're running on unified memory platforms where there are different memory types. We'd need to specify the type to the map/unmap calls. These types have different properties as bandwidth and in which physical chips it is stored. Alignment requirements might also vary per type for example. So we would have one heap-type with "Type A" memory with 16K page-size, and one heap-type with "Type B" memory with 64K or 2M page-size. This is also why sharing pagemap here would not work for this specific case - the memory is actually physically different, and we cannot mix allocations made for a heap of Type A with allocations for Type B in the same mapped page. In other cases though, where we would have multiple runtime instances of the same heap-type to limit and localize allocations, sharing a pagemap would be fine. |
One thing I'm noticing btw is that using the FixedRange configuration eats us considerably more memory for us Is that expected or can it at least be explained? It looks like empty blocks aren't being merged with what appears to be their buddy counter-part, nor being unmapped. |
Not expected.
If you could provide me a repro of that I would be very interested.
It should be decommitting properly. It is only the Pal that controls if decommit happens or not. Obviously, there could be errors in untested combinations. |
So this is really interesting. I previously had an issue with a service that seemed to be experience fragmentation over time on SGX. I thought it was due to thread caching just using more memory than the limited SGX machines, so I disabled the thread caching of "chunks" in the backend, and the problem went away. But I wonder if there is something else going on as here it looks similar. How many threads are you using? What is the thread usage pattern. N-long term threads, or creating and removing lots of threads? Matt |
Also, if you have a chance, would you be able to try #751. That has a simpler thread local state model. |
We have about a dozen threads, all long-living There shouldn't be any teardown of the thread-allocators happening, as all threads are basically permanent (created at startup of the application, destroyed at shutdown). I can try and give that patch a spin early next week, thanks |
I tried the patch, and it doesn't really seem to matter much I'm afraid. One thing I noticed though. As I'm still testing things, I was just throwing everything into snmalloc, including a single 3GB allocation. Due to its buddy-behavior that alone eats up an extra 1GB. It's not related to the bigger footprint though. Even in another test where it's mostly small allocations, things are scattered out a lot more compared to the global config while still mapping everything. I'll see if I can find some time and make an isolated repro case that doesn't involve running our entire application. |
Thanks. I have been looking at resurrecting an old PR I never landed: #616 If I get this working nicely again, then we might be able to trace your memory usage to get some ideas of where things are going wrong. |
@Trithek I have update #616 to be usable again. This has a function
which should dump to stderr some CSV of various stats that could be used to debug the issue. If you were able to dump that periodically during the application run, it would help work out where the problem might be. |
Ah, great, much appreciated. |
For limiting some subsystems to a maximum allocation-size, I was experimenting with the fixed range configuration as it seemed particularly suited for my use-case.
I ran into some problems during initialization though, as it seems to be accessing a memory range that has not been committed yet.
Looking at the tests for it, it seems that it relies on everything being committed beforehand.
From the test:
auto size = bits::one_at_bit(28);
auto oe_base = DefaultPal::reserve(size);
DefaultPal::notify_using(oe_base, size); // <--- Commiting the entire range
auto oe_end = pointer_offset(oe_base, size);
std::cout << "Allocated region " << oe_base << " - " << pointer_offset(oe_base, size) << std::endl;
CustomGlobals::init(nullptr, oe_base, size);
I'm trying to run this on a platform that does not have lazy commit, so running a notify_using on the entire range would fully commit all pages to the range, which is undesirable.
Main questions:
I've only just starting looking into snmalloc and I'm impressed by the clean code and high level of configurability, so I am kind of hoping this can be setup in the Config without having to change the backend too much :)
The text was updated successfully, but these errors were encountered: