-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak using Cuda-Aware MPI_Send and MPI_Recv for large packets of data #9051
Comments
FYI @open-mpi/ucx |
adding @Akshay-Venkatesh
|
|
@geohussain which version of ucx is being used here? |
|
I'm able to reproduce the issue. cuda-ipc transport in UCX caches peer mappings and a free call of peer mapped memory is not guaranteed to release memory. These get freed at finalize (or if VA recycling is detected which appears not to be the case) and the workaround is to disable caching by using (UCX_CUDA_IPC_CACHE=n). For the sample program you've provided, this doesn't have an impact on performance because the transfer sizes are large and because there is no communication buffer reuse but for programs different from this, there would be a performance penalty. UCX could intercept cudaFree calls but it would have to notify each peer that maps this memory out of band and this logic is somewhat complex. Would the the current workaround suffice? The modified test and run command is here: https://gist.github.com/Akshay-Venkatesh/d44e51aea6e980a06f75991bed57c90b FYI @bureddy |
Hi, |
When using UCX, this issue is addressed by openucx/ucx#10104 which used library internal buffers by default and doesn't directly map user buffers (which is the root cause behind the leaks) |
I just pulled UCX master (4234ca0cd), compiled, and subsequently built openmpi 5.0.5 with that UCX but I'm still seeing the memory growth on the GPU for the original code example. Is there an environment variable or different configure flag I need to get this fix working? UCX configure command: OpenMPI configure command: |
Ok small update I've checked that the issue still occur with
|
After investigations my issue was lead to the discovery of issues related to IPC handling (detailed in #12849), although I don't know if this issue is due to the same root cause. |
@tdavidcl What I said above is wrong. It turns out that openucx/ucx#10104 doesn't actually address the memory leak. My apologies for the wrong claim. We plan to address this memory leak in UCX 1.19 after the upcoming release at the end of October. |
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.0.5
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from source v4.0.5 with cuda-aware enabled
Please describe the system on which you are running
Details of the problem
When I send large packets of data between GPUs (~1Gigabytes) using
MPI_Send
andMPI_Recv
and free Cuda variables afterwards, the memory does not get freed on the GPU and starts inflating in subsequent iterations. The expected behavior is that memory in the GPU should be after sending and receiving large packets of data. The following is the code that is producing this behavior.main.cpp
CMakeLists.txt
The text was updated successfully, but these errors were encountered: