-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defect: same-image copies incorrect with teams #632
Comments
My bad: my examples violate the Fortran 2018 standard. Image index values specified in an image selector refer to the current team (unless the TEAM= or TEAM_NUMBER= specifier appear in an image selector, neither of which OpenCoarrays currently supports). Per Fortran 2018 (N2146 draft):
The same-image check in OpenCoarrays 2.4.0 should be valid as-is, and the PR is incorrect. But there is a bigger problem: image index values in image selectors appear to be interpreted not in the context of the current team, but in the team in which the coarray was allocated. To illustrate, here's a (what I believe is a) standard-compliant modification of the above example code, which assigns odd-numbered images to team 1, and even-numbered images to team 2:
Expected result:
Additional eyeballs to confirm/refute my interpretation of the Fortran 2018 standard & validity of the test case would be appreciated. Assuming the above is accurate, it appears that the image index(es) specified in the image selector(s) must be translated from the MPI group of the current team to the MPI group of the corresponding coarray's MPI window the MPI_Put/MPI_Get. If there are no other volunteers, I think I should be able to implement this translation (caveat: time permitting) and submit an updated PR. |
Yes, this is definitely a bug that needs to be fixed! Thanks for catching this! As such, I take it that the PR (#633) should NOT be merged? I agree with your reading of the standard. Perhaps @rouson and or Michael can weigh in here to confirm? |
@zbeekman : that's correct, PR #633 should not be merged. I'm in the process of fixing this issue for several routines (get, send, sendget, get_by_ref, send_by_ref, and sendget_by_ref) and adding simple unit tests. I should be able to finish this week. Once I'm done, should I amend my previous commit in PR #633 (and force push), add a new commit, or submit a new PR? Furthermore, should this issue be closed and a new one submitted (since the actual issue appears to be the opposite of what I originally thought), or should the discussion continue in this issue (perhaps correcting the title)? |
I'm running into an issue trying to implement this for sendget_by_ref, and I may be encountering a bug in the OpenCoarrays 2.5.0 implementation of sendget_by_ref:
Output:
I'm expecting the R_send%A array to be (for all images, after the "|" sign in the output):
Could someone double-check this example? If there is a problem with sendget_by_ref (and a fix isn't imminent), I could update PR #633 to fix the other aforementioned routines, and a new issue could be opened for sendget_by_ref. |
Translate image-selector image indices to be w.r.t. the current team. Untested for sendget_by_ref due to possible issue described in sourceryinstitute#632.
I've updated PR #633 to reflect the interpretation of image-selector image index values in the context of teams as described in #632 (comment). I did what I think is the right thing for sendget_by_by_ref, but as per #632 (comment), I think there is a problem in sendget_by_ref as-is, and so those modifications are effectively untested. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
So just to make sure I'm following this:
After I merge PR #633, let's close this issue, assuming the teams problem you reported here is adequately addressed. Can you then open a new issue describing any deficiencies/problems with Does that sound like a decent plan, and like I'm following things correctly? |
@nathanweeks please see my previous comment and re-open/file new issues as appropriate. Thanks so much for your contribution to get this fixed!!! |
@zbeekman : your understanding is correct; in particular, the sendget_by_ref issue is illustrated by #632 (comment). Per your advice, I'll open a new issue for it. |
Great, thanks so much, Nathan! |
Defect/Bug Report
uname -a
: Linux ... 4.4.156-94.61.1.16335.0.PTF.1107299-default tests dis_transpose: test passed #1 SMP ... x86_64 x86_64 x86_64 GNU/LinuxObserved Behavior
*caf_get(), *caf_send(), and *caf_sendget() detect when the cosubscript(s) of the coarray(s) refer to the same image as the executing image; in that case, an optimized copy (avoiding MPI) is performed. However, this detection can be incorrect when the image indexes of the current team don't correspond to the cobounds of the coarray(s), leading to incorrect results.
Expected Behavior
Steps to Reproduce
The following examples form teams that have one image per team.
for *caf_get():
With 3 images, all images are expected to have L = [1,2,3]; however, only image 1 does:
*caf_send():
With 3 images, all are expected to have R = [1,2,3]; image 1's result is incorrect:
*caf_sendget():
On each image, R1 is expected to be a 3x3 matrix of the form:
("1 1 1 2 2 2 3 3 3" when output by the gfortran write(,) statement). However:
The text was updated successfully, but these errors were encountered: