Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check TSD key != NULL to support cases where TSD values are accessed on different threads (ULT)? #8527

Open
janciesko opened this issue Feb 25, 2021 · 2 comments

Comments

@janciesko
Copy link
Contributor

I am running into a segfault where a TSD key is NULL during MPI_Finalize, see BT below. Hypothesis: MPI_Finalize might be called from a different thread than the progress thread is on. This can result in a opal_tsd_key_t == NULL in the type opal_tsd_tracked_key_t. Should we protect this by an if condition or rely on the ULT implementation to handle NULL keys correctly?

Observation:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff5043522 in qthread_key_delete (key=0x0) at ../../src/tls.c:44
#2 0x00007ffff6926659 in opal_tsd_key_delete (key=0x0) at ../../../../opal/mca/threads/qthreads/threads_qthreads_tsd.h:36
#3 0x00007ffff69269c5 in opal_tsd_tracked_key_destructor (key=0x7ffff7bbd260 <print_args_tsd_key>) at ../../../../opal/mca/threads/base/tsd.c:34
#4 0x00007ffff783bf8b in opal_obj_run_destructors (object=0x7ffff7bbd260 <print_args_tsd_key>) at ../../opal/class/opal_object.h:483
#5 0x00007ffff78415e0 in ompi_rte_finalize () at ../../ompi/runtime/ompi_rte.c:955
#6 0x00007ffff78398d4 in ompi_mpi_finalize () at ../../ompi/runtime/ompi_mpi_finalize.c:468
#7 0x00007ffff787f717 in PMPI_Finalize () at pfinalize.c:54

@rhc54
Copy link
Contributor

rhc54 commented Feb 25, 2021

You are perhaps running into an issue because PMIx has its own progress thread and knows nothing about the OPAL thread abstraction? It does use TSD, but in its own context of course.

@janciesko
Copy link
Contributor Author

Yes, that's possible. Since we document that this can happen, my feeling is that the ULT RTs should handle that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants