You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default cuda is not set properly to the trainer.root_gpu in single-GPU mode. The tensors created with device='cuda' will be placed on the incorrect gpu, and the dataloader will acquire memory on the incorrect gpu when pin_memory=True.
🐛 Bug
The default
cuda
is not set properly to thetrainer.root_gpu
in single-GPU mode. The tensors created withdevice='cuda'
will be placed on the incorrect gpu, and the dataloader will acquire memory on the incorrect gpu whenpin_memory=True
.Maybe we'll need to add
torch.cuda.set_device(self.trainer.root_gpu)
to https://github.com/PyTorchLightning/pytorch-lightning/blob/5dfc7b157e7febab692036b7392dac8b52f41b87/pytorch_lightning/accelerators/gpu_backend.py#L24as
DDPBackend
did:https://github.com/PyTorchLightning/pytorch-lightning/blob/5dfc7b157e7febab692036b7392dac8b52f41b87/pytorch_lightning/accelerators/ddp_backend.py#L195
To Reproduce
Running the following code will get
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Code sample
Expected behavior
No
RuntimeError
occurs.Environment
Additional context
The text was updated successfully, but these errors were encountered: