-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NeptuneLogger doesn't work with distributed_backend='ddp' #1683
Comments
Hi! thanks for your contribution!, great first issue! |
@jakubczakon pls ^^ |
Hi @hirune924 and thanks for raising it! I've already notified the dev team and @pitercl will get back to you once we have a solution. |
@jakubczakon I assume that this is also Neptune issue only, not related to PL, right? |
@hirune924 mentioned:
So I think it may be a more general problem @Borda, |
Hi @hirune924, @Borda, just a quick update on this: from my initial tests, our Neptune logger and multiprocessing don't mix well. Tomorrow I'll dig a bit deeper into that and see if/how it can be remedied. @Borda, I don't know yet if this is more on Neptune side or PL side. |
Another quick update: I have a solution to this and will create a PR with a fix today or tomorrow. |
🐛 Bug
When using NeptuneLogger with distributed_backend='ddp' and running it on a single node with two GPUs, I find an error like this.
And I found a similar error with CommetLogger
To Reproduce
Steps to reproduce the behavior:
Run the following code on a machine with two GPUs.
This code is a slightly modified version of what was on this page.
https://docs.neptune.ai/integrations/pytorch_lightning.html
Code sample
Expected behavior
Environment
- GPU:
- GeForce GTX TITAN X
- GeForce GTX TITAN X
- available: True
- version: 10.1
- numpy: 1.18.1
- pyTorch_debug: False
- pyTorch_version: 1.5.0
- pytorch-lightning: 0.7.5
- tensorboard: 2.2.1
- tqdm: 4.42.1
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.7.6
- version: Dataset only available when the trainer is instantiated #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020
Additional context
The text was updated successfully, but these errors were encountered: