-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 762 #763
Issue 762 #763
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played around with this for a bit now, and can confirm it solves the crashing with openmpi. I have no idea why the change fixes it for openmpi without breaking mpich, or why it originally worked with mpich but not openmpi, but it works now. 🤷♂️
I also figured out why I was getting incorrect results with mpich. cafrun was still using mpiexec from openmpi, even though caf did link the executable to mpich. Perhaps something to look into 🤷♂️? I still think the new test is valuable, so no sense in getting rid of it.
Now the bad news. We still crash with Intel mpi. I'd say we should still merge this and call it a win, we just aren't done yet.
While mpich and openmpi do not care whether memory allocated with MPI_Alloc_mem is freed using free() or MPI_Free_mem, the intel MPI lib crashes when not being in sync there. Therefore use MPI_Free_mem for all memory allocate using MPI_Alloc_mem.
Intel crashing should be fixed by commit 9d4afcb. At least it does on my Fedora 35 Linux with Intel MPI 2021.6 . |
I can confirm that this did solve the crashing with Intel MPI on Linux, and did improve the situation on Windows. Unfortunately, it does still crash on Windows, now with less severity it seems. The output from Windows is:
I'm still of the opinion that if solving this last issue is much more effort, we can go ahead and merge this and solve the last thing as a separate PR. Up to you though. |
I tried to debug this under Windows, but I see a different error. I see this error:
This seems to be runtime related and I have no clue how to debug this on windows. So lets merge the existing fixes and if this is important do another round. |
Summary of changes
Fix crash on certain platforms on finalize.
Rationale for changes
On finalize opencoarray was MPI_Win_detaching a management structure instead of previous Win_attached token. Detaching the token fixes the crash with openmpi. The testcase provided in the first commit does/can not really test the fix, because the tests do not check for crashing tests.
Additional info and certifications
This pull request (PR) is a:
I certify that
OpenCoarrays developer a chance to review my proposed code
be introduced)
Code coverage data