Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running parallel MPI RCall #204

Closed
ignacioq opened this issue Oct 6, 2017 · 4 comments
Closed

Running parallel MPI RCall #204

ignacioq opened this issue Oct 6, 2017 · 4 comments

Comments

@ignacioq
Copy link

ignacioq commented Oct 6, 2017

Hi, I'm trying to run an MPI parallel job, where RCall is used to read some data. When running the code in one node with multiple cores (using julia -p <n>), it works, but when I use cores in different nodes (using julia --machinefile) I get the following error:

WARNING: Node state is inconsistent: node 16 failed to load cache from /.../.julia/lib/v0.6/RCall.ji. Got:
WARNING: InitError: error compiling __init__: could not load library "/.../Apps/R/3.4.1-generic/lib/R/lib/libR.so"
libRblas.so: cannot open shared object file: No such file or directory
during initialization of module RCall
WARNING: Node state is inconsistent: node 3 failed to load cache from /.../.julia/lib/v0.6/RCall.ji. Got:
WARNING: InitError: error compiling __init__: could not load library "/.../Apps/R/3.4.1-generic/lib/R/lib/libR.so"
libRblas.so: cannot open shared object file: No such file or directory

(I deleted some of the directory paths and changed them for ...) And so forth for each node. Following this thread, I tried using import RCall; @everywhere using RCall, but I get the same error. I'm trying to run this on a cluster that uses Slurm.

@simonbyrne
Copy link
Member

Not sure I can help, but look at
https://discourse.julialang.org/t/everywhere-using-mylib-fails-when-procs-on-remote-node/4515/7
and
JuliaLang/julia#21718

Are you only using R to read in data? If so, what format is it in?

@ignacioq
Copy link
Author

ignacioq commented Oct 8, 2017

Thank you. I just use R to read phylogenetic trees through the ape package (I know, I should just do a tree parser myself). Yeah, I was using import RCall; @everywhere using RCall, but is still unsuccessful. However, other packages work fine when loading them in parallel, it is only RCall that fail, but it seems is just because it cannot locate R successfully.

I was wondering if you have insights here on why this might be, and if there is a solution (I think I read someplace else that in parallel you just open 1 session of R instead of one to each process, might this be it?)

@simonbyrne
Copy link
Member

Sorry, I don't really know. The only thing I can suggest is manually R_HOME on all the nodes in the cluster.

@palday
Copy link
Collaborator

palday commented Jul 18, 2024

I want to second the suspicion that R isn't being set up correctly on all the nodes -- might be worthwhile to set R_HOME on all the nodes and then call Pkg.build("RCall") after doing so.

I'm go to go ahead and close this as stale as this issue has been inactive for a long time now and a lot has changed in the meantime. If you managed to get this working and want to drop a breadcrumb here for posterity, that would be great.

@palday palday closed this as completed Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants