Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI attribute code need threading audit #4

Closed
ompiteam opened this issue Oct 1, 2014 · 6 comments
Closed

MPI attribute code need threading audit #4

ompiteam opened this issue Oct 1, 2014 · 6 comments
Labels
Milestone

Comments

@ompiteam
Copy link
Contributor

ompiteam commented Oct 1, 2014

In reviewing bug https://svn.open-mpi.org/trac/ompi/ticket/176, I have determined that the locking code in source:/trunk/ompi/attribute/attribute.c may not be thread safe in all cases and needs to be audited. It was written with the best of intentions :-) but then never tested and I think there are some obscure race conditions that ''could'' happen.

For example, in ompi_attr_create_keyval(), we have the following:

    OPAL_THREAD_LOCK(&alock);
    ret = CREATE_KEY(key);
    if (OMPI_SUCCESS == ret) {
        ret = opal_hash_table_set_value_uint32(keyval_hash, *key, attr);
    }
    OPAL_THREAD_UNLOCK(&alock);
    if (OMPI_SUCCESS != ret) {
        return ret;
    }

    /* Fill in the list item */

    attr->copy_attr_fn = copy_attr_fn;
    /* ...fill in more attr->values ... */

This could clearly be a problem since we set the empty keyval on the hash and therefore it's available to any other thread as soon as the lock is released -- potentially ''before'' we finish setting all the values on the attr variable (which is poorly named -- it's a keyval, not an attribute).

This one problem is easily fixed (ensure to setup attr before we assign it to the keyval hash), but it reflects that the rest of the attribute code should really be audited. Hence, this ticket is a placemarker to remember to audit this code because it may not be thread safe.

@ompiteam ompiteam self-assigned this Oct 1, 2014
@ompiteam ompiteam added this to the Future milestone Oct 1, 2014
@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Imported from trac issue 184. Created by jsquyres on 2006-07-05T12:39:43, last modified: 2008-05-29T18:01:57

@ompiteam ompiteam added the bug label Oct 1, 2014
@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jsquyres on 2007-12-03 09:20:29:

If threading support is going into v1.3, this ticket should be moved to the v1.3 milestone.

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jsquyres on 2008-05-29 18:01:57:

This will not happen for v1.3.

rhc54 pushed a commit that referenced this issue Oct 12, 2014
elenash referenced this issue Nov 26, 2014
Reenable high accuracy timers
yosefe pushed a commit to yosefe/ompi that referenced this issue Mar 5, 2015
…onnect_fix

OSHMEM: spml ikrit: fix mxm disconnect flow
lrrajesh pushed a commit to lrrajesh/ompi that referenced this issue Mar 19, 2015
@jsquyres
Copy link
Member

@gpaulsen if someone could write a multi-threaded attribute test (i.e., a bunch of threads all simultaneously reading/writing attributes on an object), that would easily show any problems that we may have here.

Honestly, the attribute code does not need to be high performance. I would not be heartbroken at all if the fine-grained locking code in the attribute code went away and was replaced with a simpler one-global-lock-for-all-attribute-things approach.

This is not a sexy issue, but it still is important for MPI_THREAD_MULTIPLE reasons.

ggouaillardet referenced this issue in ggouaillardet/ompi Jan 13, 2018
ggouaillardet added a commit that referenced this issue Jun 16, 2018
shizhibao pushed a commit to shizhibao/ompi that referenced this issue Jan 17, 2021
fix for allreduce non-contiguous datatypes
a-szegel pushed a commit to a-szegel/ompi that referenced this issue May 13, 2022
opal/mca/accelerator/cuda: Made CUDA Component have single stage init
@gpaulsen gpaulsen removed their assignment Aug 22, 2023
@jsquyres
Copy link
Member

I'm guessing this issue was addressed long ago. The code in ompi/attribute/attribute.c does not resemble the code cited in the body of this issue -- the locking scheme is quite different.

Also, there's a ton of issues/PRs that reference this one, but I think most (all?) of them are accidents: e.g., people just citing the 4th item in a list, and github automatically linked it back here to issue number 4.

I'm going to close this issue as completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants