Skip to content

Fixed range relying on lazy commit? #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Trithek opened this issue Mar 5, 2025 · 17 comments
Open

Fixed range relying on lazy commit? #755

Trithek opened this issue Mar 5, 2025 · 17 comments

Comments

@Trithek
Copy link
Contributor

Trithek commented Mar 5, 2025

For limiting some subsystems to a maximum allocation-size, I was experimenting with the fixed range configuration as it seemed particularly suited for my use-case.
I ran into some problems during initialization though, as it seems to be accessing a memory range that has not been committed yet.

Looking at the tests for it, it seems that it relies on everything being committed beforehand.

From the test:

auto size = bits::one_at_bit(28);
auto oe_base = DefaultPal::reserve(size);
DefaultPal::notify_using(oe_base, size); // <--- Commiting the entire range
auto oe_end = pointer_offset(oe_base, size);
std::cout << "Allocated region " << oe_base << " - " << pointer_offset(oe_base, size) << std::endl;

CustomGlobals::init(nullptr, oe_base, size);

I'm trying to run this on a platform that does not have lazy commit, so running a notify_using on the entire range would fully commit all pages to the range, which is undesirable.

Main questions:

  • Is this behavior (i.e. the need for the range to be committed beforehand) intentional and desired?
  • Is there currently a supported way around this so that the allocator will actively call 'notify_using' on the bits that it uses for the pagemap?

I've only just starting looking into snmalloc and I'm impressed by the clean code and high level of configurability, so I am kind of hoping this can be setup in the Config without having to change the backend too much :)

@mjp41
Copy link
Member

mjp41 commented Mar 5, 2025

Is this behavior (i.e. the need for the range to be committed beforehand) intentional and desired?

So that particular configuration was added to support SGX enclaves. They don't have any notion of commit, so all the notify_using/notify_notusing calls are basically no-ops or just zeroing if required. There is no requirement for that, and the layers in front will call using/not_using in all the places that are required on platforms without lazy commit like Windows.

Is there currently a supported way around this so that the allocator will actively call notify_using on the bits that it uses for the pagemap?

So in the fixed range config it will calculate the space required for the pagemap,

auto [heap_base, heap_length] =
Pagemap::concretePagemap.init(base, length);

and then call notify_using on the whole range used for the page map on initialisation:

Pagemap::register_range(Authmap::arena, heap_length);

The rest of the supplied range will have notify_using called as required for allocation of objects.

We need the pagemap to exist for the whole range as we use it as the physical memory for the buddy allocators/red black trees that track the unused, but snmalloc owned memory.

so I am kind of hoping this can be setup in the Config without having to change the backend too much :)

Based on your description you probably need to modify a Pal slightly for your platform so that it calls what you need for notify_using/not_using, and such that it always fails on reserve. This doesn't exist yet, but is pretty easy to extract from the existing Pals.

So I would start with:
https://github.com/microsoft/snmalloc/blob/ccc03ce0fc1f6ab6f8c2fdb613572a87e97ec7e1/src/snmalloc/pal/pal_noalloc.h

In this is forwards some operations to the underlying Pal for the system:

static void message(const char* const str) noexcept
{
BasePAL::message(str);
}

but importantly, and not what you want, it doesn't for notify_using/...:
/**
* Notify platform that we will not be using these pages.
*
* This is a no-op in this stub.
*/
static void notify_not_using(void*, size_t) noexcept {}
/**
* Notify platform that we will be using these pages.
*
* This is a no-op in this stub, except for zeroing memory if required.
*/
template<ZeroMem zero_mem>
static void notify_using(void* p, size_t size) noexcept
{
if constexpr (zero_mem == YesZero)
{
zero<true>(p, size);
}
else
{
UNUSED(p, size);
}
}

If you changed these to also call into the BasePal, then I think it would work.

If you instantiate a FixedConfig with that Pal everything "should just work" in principle. However, doing something new normally throws up issues, so please let me know how you get on.

I've only just starting looking into snmalloc and I'm impressed by the clean code and high level of configurability,

Thanks. Writing it in C++ has helped a lot of making the configurability much cleaner than the #ifdef mess that typically ends up in C code bases.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 5, 2025

Thanks for the quick reply

I tried a few things in the fixed_region.cc test code, just running on normal Windows for now.

One of them indeed using the DefaultPAL directly instead of PALNoAlloc, as it would indeed need to call those commits/releases.
Second, the notify_using in the top-level code needs to go to avoid committing everything, so I removed that

  auto size = bits::one_at_bit(28);
  auto oe_base = DefaultPal::reserve(size);
  //DefaultPal::notify_using<NoZero>(oe_base, size);
  auto oe_end = pointer_offset(oe_base, size);

Adding some debug info in the PAL gives me this:

reserve 0x25a97cc0000 - 0x25aa7cc0000 : 268435456 bytes <-- the initial reserve for the entire range, called from fixed_region.cc
notify_using 0x25b2e71f000 - 0x25b2e760000 : 266240 bytes <-- the register_range for the concrete pagemap

The pagemap register_range seems to be shifting things out of the reserved range, causing a commit on an address that hasn't been reserved -> error.

From ds\pagemap.h:

    void register_range(address_t p, size_t length)
    {
      SNMALLOC_ASSERT(is_initialised());

      // Calculate range in pagemap that is associated to this space.
      auto first = &body[p >> SHIFT];
      auto last = &body[(p + length + bits::one_at_bit(SHIFT) - 1) >> SHIFT];

      // Commit OS pages associated to the range.
      auto page_start = pointer_align_down<OS_PAGE_SIZE, char>(first);
      auto page_end = pointer_align_up<OS_PAGE_SIZE, char>(last);
      size_t using_size = pointer_diff(page_start, page_end);
      PAL::template notify_using<NoZero>(page_start, using_size);

When hacking notify_using to do a RESERVE | COMMIT just to check whether it's simply a forgotten reserve, it still crashed right in the next bit of the calling code on line 106

From backend_helpers\pagemap.h

    static void register_range(capptr::Arena<void> p, size_t sz)
    {
      concretePagemap.register_range(address_cast(p), sz);
      if constexpr (!CONSOLIDATE_PAL_ALLOCS)
      {
        // Mark start of allocation in pagemap.
        auto& entry = get_metaentry_mut(address_cast(p));
        entry.set_boundary(); <-- **SEGFAULT**
      }
    }

Here 'entry' is pointing to the beginning of the range, which has been reserved, but not committed.

Example output

reserve 0x25a97cc0000 - 0x25aa7cc0000 : 268435456 bytes
notify_using 0x25b2e71f000 - 0x25b2e760000 : 266240 bytes
Exception thrown: read access violation.
this was 0x25A97CC0100.

I'm not entirely sure what the 'body' really is or should be and why it would be outside of the reserved arena region. Any ideas?

@mjp41
Copy link
Member

mjp41 commented Mar 5, 2025

This is precisely the kind of thing I expected with my "However, doing something new normally throws up issues, so please let me know how you get on.".

The pagemap register range code needs a second path for has_bounds == true. The following bit of code is assuming has_bounds == false:

// Calculate range in pagemap that is associated to this space.
auto first = &body[p >> SHIFT];
auto last = &body[(p + length + bits::one_at_bit(SHIFT) - 1) >> SHIFT];

Other code paths have something like:

if constexpr (has_bounds)
{
if (p - base > size)
{
if constexpr (potentially_out_of_range)
{
return const_cast<T&>(default_value);
}
else
{
// Out of range null should
// still return the default value.
if (p == 0)
return const_cast<T&>(default_value);
PAL::error("Internal error: Pagemap read access out of range.");
}
}
p = p - base;
}

I believe adding something like:

if constexpr (has_bounds)
{
  if ((p - base >= size) || (p + length - base >= size))
  {
    error("Attempting to commit outside of fixed range");
  }

  // Correct for index into pagemap with bounds.
  p = p - base;
}

should work on that function. If this fixes your issue, I'd love to receive a PR. Thanks

@Trithek
Copy link
Contributor Author

Trithek commented Mar 5, 2025

Yup, that works!

Amazing, thanks.
I'll have a PR on the way later today.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 5, 2025

Some follow-up questions by the way, if you have the time

  • Now that this fixed-range allocation works, I would need multiple instances of it.
    But of course, there's a lot of statics in the types that store the actual data.
    What I've been doing is parameterize my Config, making a separate type for each heap, which also parameterizes any underlying type, which I think should cover this use-case.
    Is this assumption correct? Or are there some cases that would need to be separate somehow as to avoid different heaps being mixed up?
    Also, would there be a way to create heaps at runtime, i.e. have the metadata somehow not be a global for a type?

  • I would also need to customize my PAL for each heap as the mapping/unmapping is slightly different for each heap. I've been feeding a custom PAL that reads from its Config.
    Is this the intended way? I made sure the DefaultPAL was an empty stub, just to cover cases that use DefaultPal instead of a passed in type.

  • And finally, I still bumped into some OS_PAGE_SIZE usage (which gets the page-size from the DefaultPal), instead of using the heap-specific Pal.
    I'm guessing this might also just be something that could be done, as it should have the custom PAL available on all call-sites as far as I can see?

Thanks

@mjp41
Copy link
Member

mjp41 commented Mar 5, 2025

Now that this fixed-range allocation works, I would need multiple instances of it.
But of course, there's a lot of statics in the types that store the actual data.
What I've been doing is parameterize my Config, making a separate type for each heap, which also parameterizes any underlying type, which I think should cover this use-case.
Is this assumption correct? Or are there some cases that would need to be separate somehow as to avoid different heaps being mixed up?

A separate type for each heap should work. That is what I have been thinking about to do a partition alloc like feature.

Also, would there be a way to create heaps at runtime, i.e. have the metadata somehow not be a global for a type?

Making it dynamic is within reach of the design, but not something I have done yet. We could create an allocator pool that all use the same backend range, but this would require some complex changes that I haven't got time for at the moment. But is on our work list.

It would also be interesting to share the pagemap across all the different heaps, rather than having a page map for each one.

  • I would also need to customize my PAL for each heap as the mapping/unmapping is slightly different for each heap. I've been feeding a custom PAL that reads from its Config.
    Is this the intended way? I made sure the DefaultPAL was an empty stub, just to cover cases that use DefaultPal instead of a passed in type.

That seems reasonable. I haven't considered having many different Pals. Could you elaborate on what the differences are between each heap that makes the Pal differ?

  • And finally, I still bumped into some OS_PAGE_SIZE usage (which gets the page-size from the DefaultPal), instead of using the heap-specific Pal.
    I'm guessing this might also just be something that could be done, as it should have the custom PAL available on all call-sites as far as I can see?

I would like to move this to a more dynamic value that is got from the current Pal. Some platforms have variable page size that can be configured by environment variables. Ultimately, I would like that as a feature, but it hasn't been a priority, yet.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 5, 2025

It would also be interesting to share the pagemap across all the different heaps, rather than having a page map for each one.

That could be a useful saving indeed, but for my use-case at least we would need the option to keep them separate as well.

Could you elaborate on what the differences are between each heap that makes the Pal differ?

We're running on unified memory platforms where there are different memory types. We'd need to specify the type to the map/unmap calls. These types have different properties as bandwidth and in which physical chips it is stored. Alignment requirements might also vary per type for example. So we would have one heap-type with "Type A" memory with 16K page-size, and one heap-type with "Type B" memory with 64K or 2M page-size.

This is also why sharing pagemap here would not work for this specific case - the memory is actually physically different, and we cannot mix allocations made for a heap of Type A with allocations for Type B in the same mapped page.

In other cases though, where we would have multiple runtime instances of the same heap-type to limit and localize allocations, sharing a pagemap would be fine.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 11, 2025

One thing I'm noticing btw is that using the FixedRange configuration eats us considerably more memory for us
In one application we're hitting ~900MB instead of ~600MB
In another we're at 7GB instead of just over 2GB

Is that expected or can it at least be explained?

It looks like empty blocks aren't being merged with what appears to be their buddy counter-part, nor being unmapped.
Is there still some assumption that everything is pre-committed for the SGX enclave perhaps and thus wouldn't need uncommitting?

@mjp41
Copy link
Member

mjp41 commented Mar 13, 2025

Is that expected or can it at least be explained?

Not expected.

It looks like empty blocks aren't being merged with what appears to be their buddy counter-part, nor being unmapped.

If you could provide me a repro of that I would be very interested.

Is there still some assumption that everything is pre-committed for the SGX enclave perhaps and thus wouldn't need uncommitting?

It should be decommitting properly. It is only the Pal that controls if decommit happens or not. Obviously, there could be errors in untested combinations.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 13, 2025

Finding a repro that I can share might take a bit of effort, but I'll see what I can do
In the meantime, I can probably share these screenshot from our heap visualizer

The diagonal striped areas mean it's mapped. The red areas are active allocations. You can still see the lines around the blocks that have been mapped.

First, the version with the standard configuration, no fixed range. Some black gaps indicate unmapped memory
Image

The same test-case with a fixed-range configured. No black gaps anywhere, even though there's plenty of blocks that have no active allocations and could be released. This is using about 300MB more
Image

Another test case went from 2GB to 7GB when using a fixed range arena, so it's quite substantial. It looks like things are just not being released nor even re-used.

I haven't traced deallocs yet as I had to work on something else for a bit. Somewhere the path taken must deviate somehow.

@mjp41
Copy link
Member

mjp41 commented Mar 13, 2025

So this is really interesting. I previously had an issue with a service that seemed to be experience fragmentation over time on SGX. I thought it was due to thread caching just using more memory than the limited SGX machines, so I disabled the thread caching of "chunks" in the backend, and the problem went away. But I wonder if there is something else going on as here it looks similar.

How many threads are you using? What is the thread usage pattern. N-long term threads, or creating and removing lots of threads?

Matt

@mjp41
Copy link
Member

mjp41 commented Mar 13, 2025

Also, if you have a chance, would you be able to try #751. That has a simpler thread local state model.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 13, 2025

We have about a dozen threads, all long-living

There shouldn't be any teardown of the thread-allocators happening, as all threads are basically permanent (created at startup of the application, destroyed at shutdown).

I can try and give that patch a spin early next week, thanks

@Trithek
Copy link
Contributor Author

Trithek commented Mar 21, 2025

I tried the patch, and it doesn't really seem to matter much I'm afraid.

One thing I noticed though. As I'm still testing things, I was just throwing everything into snmalloc, including a single 3GB allocation. Due to its buddy-behavior that alone eats up an extra 1GB.

It's not related to the bigger footprint though. Even in another test where it's mostly small allocations, things are scattered out a lot more compared to the global config while still mapping everything.

I'll see if I can find some time and make an isolated repro case that doesn't involve running our entire application.

@mjp41
Copy link
Member

mjp41 commented Mar 21, 2025

Thanks. I have been looking at resurrecting an old PR I never landed: #616

If I get this working nicely again, then we might be able to trace your memory usage to get some ideas of where things are going wrong.

@mjp41
Copy link
Member

mjp41 commented Mar 25, 2025

@Trithek I have update #616 to be usable again. This has a function

snmalloc::print_alloc_stats()

which should dump to stderr some CSV of various stats that could be used to debug the issue. If you were able to dump that periodically during the application run, it would help work out where the problem might be.

@Trithek
Copy link
Contributor Author

Trithek commented Mar 25, 2025

Ah, great, much appreciated.
I won't have too much time this week, but next week I should be able to give this a try I think
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants