Skip to content

handle reentrancy during initialization #762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

SchrodingerZhu
Copy link
Collaborator

@SchrodingerZhu SchrodingerZhu commented Mar 26, 2025

During initialisation, snmalloc will need to call atexit to install a cleanup hook. This atexit function may call malloc on themselves while holding some mutexes, thus leading to deadlocks.

This is only a very rare situation depending on libc implementations. Normally, malloc is called at least once during early cruntime startup. However, as shown in the test, if the implementation happen to skip early initialisation of the allocator and the application happen to install a bunch of exiting hooks, then one will run into a deadlock.

This patch instead installs the main thread finalisation hook to static .fini arrays to avoid those bad situations.

@SchrodingerZhu
Copy link
Collaborator Author

SchrodingerZhu commented Mar 26, 2025

Ooops, still cannot make it work. The following code will stuck forever:

#include <snmalloc/snmalloc.h>
#include <stdlib.h>

void do_nothing() {}

extern "C" void * calloc(size_t num, size_t size) {
  return snmalloc::libc::calloc(num, size);
}

extern "C" void free(void * p) {
  return snmalloc::libc::free(p);
}

int main() {
  for (int i = 0; i < 8192; ++i)
    atexit(do_nothing);
}

@mjp41
Copy link
Member

mjp41 commented Mar 26, 2025

Interesting, I think snmalloc is correctly reentrant:

alloc = AllocPool<Config>::acquire();
// register_clean_up must be called after init. register clean up
// may be implemented with allocation, so need to ensure we have a
// valid allocator at this point.
if (!post_teardown)
{
// Must be called at least once per thread.
// A pthread implementation only calls the thread destruction handle
// if the key has been set.
Subclass::register_clean_up();

However, atexit is not reentrant from the perspective of calloc calling back into atexit. This means we should not be calling atexit in register clean up?

@SchrodingerZhu
Copy link
Collaborator Author

I am still debugging a bit.

@mjp41
Copy link
Member

mjp41 commented Mar 26, 2025

I wonder if you could fix this using the constructor attribute?

__attribute__((constructor))
void startup_init()
{
  ThreadAlloc::CheckInit::check_init([](){}, [](){});
}

This would trigger initialisation outside the atexit call. As long as glibc itself didn't call atexit, we wouldn't have a problem?

@mjp41
Copy link
Member

mjp41 commented Mar 26, 2025

I am still debugging a bit.

Good luck. I'm signing off for the night, but will look more tomorrow.

@SchrodingerZhu
Copy link
Collaborator Author

I am still debugging a bit.

Good luck. I'm signing off for the night, but will look more tomorrow.

The atexit part is fixed using attribute destructor. The test works now.

The remaining problem is pthread_setspecific which may also allocate. However, it does not hold any lock.

So according to #762 (comment), do we need to handle pthread_setspecific or not?

@SchrodingerZhu
Copy link
Collaborator Author

I am still debugging a bit.

Good luck. I'm signing off for the night, but will look more tomorrow.

The atexit part is fixed using attribute destructor. The test works now.

The remaining problem is pthread_setspecific which may also allocate. However, it does not hold any lock.

So according to #762 (comment), do we need to handle pthread_setspecific or not?

I think the safety check is no longer needed, removing them.

@mjp41
Copy link
Member

mjp41 commented Mar 27, 2025

So I have been looking at this today, and unfortunately pthread_setspecific is also a problem. I modified your example to investigate this:

#include <thread>
#include <array>
#ifndef __has_feature
#  define __has_feature(x) 0
#endif

#if defined(__linux__) && !__has_feature(address_sanitizer) && \
  !defined(__SANITIZE_ADDRESS__)
#  define RUN_TEST
#endif

#ifdef RUN_TEST
#  include <snmalloc/snmalloc.h>
#  include <stdlib.h>

// A key in the second "second level" block of the pthread key table.
// First second level block is statically allocated.
// This is be 33.
pthread_key_t key;

void thread_setspecific()
{
  // If the following line is uncommented then the test will pass.
  // free(calloc(1, 1));
  pthread_setspecific(key, (void*)1); //  (A)
}

// We only selectively override these functions. Otherwise, malloc may be called
// before atexit triggers the first initialization attempt.

extern "C" void* calloc(size_t num, size_t size)
{
  snmalloc::message("calloc({}, {})", num, size);
  return snmalloc::libc::calloc(num, size);
}

extern "C" void free(void* p)
{
  snmalloc::message("free({})", p);
  // Just leak it
  if (snmalloc::is_owned(p))
    return snmalloc::libc::free(p);
}

#endif

void callback(void*)
{
  snmalloc::message("callback");
}

int main()
{
  // The first 32 keys are statically allocated, so we need to create 33 keys
  // to create a key for which pthread_setspecific will call the calloc.
  for (size_t i = 0; i < 33; i++)
  {
    pthread_key_create(&key, callback);
  }

  // The first calloc occurs here, after the first [0, 32] keys have been created
  // thus snmalloc will choose the key 33, `key` contains the key `32` and snmalloc `33`.
  // Both of these keys are not in the statically allocated part of the pthread key space.
  std::thread(thread_setspecific).join();

  // There should be a single allocator that can be extracted.
  if (snmalloc::AllocPool<snmalloc::Config>::extract() == nullptr)
  {
    // The thread has not torn down its allocator.
    snmalloc::report_fatal_error("Teardown of thread allocator has not occurred.");
    return 1;
  }

  return 0;
}

The re-entrant call to pthread_setspecific leads to the following

  • Line labelled (A) above causes a call to calloc to allocate a Level 2 block.
  • This calloc is the first one in the thread, and thus calls register_cleanup
  • register_cleanup will call pthread_setspecfic on the same level 2 block as (A), which calls calloc.
  • The second calloc succeeds and returns
  • The pthread state for snmalloc is updated, and thus it has a teardown handler.
  • It returns from register_cleanup, and returns the outer calloc to the original pthread_setspecific call.
  • This then overwrites the level 2 block with snmalloc's set specific call.

This overwrite means that the snmalloc teardown code is not run.

@mjp41
Copy link
Member

mjp41 commented Mar 27, 2025

I also investigated _cxa_thread_atexit that seems better behaved as it does an allocation at the start unconditionally, and then adds stuff together, so it can handle reentrancy nicely.

@SchrodingerZhu
Copy link
Collaborator Author

SchrodingerZhu commented Mar 27, 2025

_cxa_thread_atexit

But __cxa_thread_atexit seems to be C++ specific (different from __cxa_atexit).

The function to use is actually __cxa_thread_atexit_impl for C runtime

@mjp41
Copy link
Member

mjp41 commented Mar 27, 2025

_cxa_thread_atexit

But __cxa_thread_atexit seems to be C++ specific (different from __cxa_atexit).

The function to use is actually __cxa_thread_atexit_impl for C runtime

I meant that the C++ destructors version (SNMALLOC_CLEANUP=CXX11_DESTRUCTORS) I think will be okay in all cases for glibc's implementation of __cxa_thread_atexit_impl.

@SchrodingerZhu
Copy link
Collaborator Author

I see.

On the other hand,

https://github.com/walac/glibc/blob/264aad1e6aca30efddfcfae05c44ad38bb5e137d/stdlib/cxa_thread_atexit_impl.c#L46

is only allocating a single cell each time, which is free of the issue.

@mjp41
Copy link
Member

mjp41 commented Mar 27, 2025

So I have pushed my experiments based on this PR:
https://github.com/mjp41/snmalloc/tree/reentrancy-init

There are two tests, and a possible new mode for doing TLS that works nicely if the libc provides __cxa_thread_atexit_impl. I probably won't get time to look at this again until next week, but I'll try to comment on the PR and issues for changes.

Really good find on this. This is super subtle, great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants