Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: Address already in use" when running multiple multi-gpu training (DDP). #401

Closed
kyoungrok0517 opened this issue Oct 21, 2019 · 2 comments · Fixed by #1010
Closed
Labels
bug Something isn't working

Comments

@kyoungrok0517
Copy link

Describe the bug
I see "RuntimeError: Address already in use" error message if I try to run two multi-gpu training session (using ddp) at the same time.

To Reproduce
Run two multi-gpu training session at the same time.

Expected behavior
Able to run two multi-gpu training session at the same time.

Screenshots
image

Desktop (please complete the following information):

  • OS: Ubuntu 18.04
  • Pytorch 1.3.0
  • CUDA 10.1
@kyoungrok0517 kyoungrok0517 added the bug Something isn't working label Oct 21, 2019
@williamFalcon
Copy link
Contributor

you have to set master_port yourself if running 2 ddp sessions on the same machien at the same time. This is a PyTorch limitation. in all the other ddp settings you wouldn’t have to worry about this.

https://pytorch-lightning.readthedocs.io/en/latest/Trainer/Distributed%20training/#multi-node

@kyoungrok0517
Copy link
Author

Great. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants