"RuntimeError: Address already in use" when running multiple multi-gpu training (DDP). #401

kyoungrok0517 · 2019-10-21T08:22:36Z

Describe the bug
I see "RuntimeError: Address already in use" error message if I try to run two multi-gpu training session (using ddp) at the same time.

To Reproduce
Run two multi-gpu training session at the same time.

Expected behavior
Able to run two multi-gpu training session at the same time.

Screenshots

Desktop (please complete the following information):

OS: Ubuntu 18.04
Pytorch 1.3.0
CUDA 10.1

williamFalcon · 2019-10-21T09:24:51Z

you have to set master_port yourself if running 2 ddp sessions on the same machien at the same time. This is a PyTorch limitation. in all the other ddp settings you wouldn’t have to worry about this.

https://pytorch-lightning.readthedocs.io/en/latest/Trainer/Distributed%20training/#multi-node

kyoungrok0517 · 2019-10-21T09:53:29Z

Great. Thanks!

kyoungrok0517 added the bug Something isn't working label Oct 21, 2019

kyoungrok0517 closed this as completed Oct 21, 2019

williamFalcon mentioned this issue Mar 2, 2020

fix port collision on DDP #1010

Merged

SkafteNicki mentioned this issue Jul 7, 2020

Using the Trainer class more than once fails with "Address already in use" with the DDP backend #2537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: Address already in use" when running multiple multi-gpu training (DDP). #401

"RuntimeError: Address already in use" when running multiple multi-gpu training (DDP). #401

kyoungrok0517 commented Oct 21, 2019

williamFalcon commented Oct 21, 2019

kyoungrok0517 commented Oct 21, 2019

"RuntimeError: Address already in use" when running multiple multi-gpu training (DDP). #401

"RuntimeError: Address already in use" when running multiple multi-gpu training (DDP). #401

Comments

kyoungrok0517 commented Oct 21, 2019

williamFalcon commented Oct 21, 2019

kyoungrok0517 commented Oct 21, 2019