-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaking when using large numpy array in Dataset #1761
Comments
Hi! thanks for your contribution!, great first issue! |
I had a similar issue and if I recall defining environmental variable COLAB_GPU forces pytorch lightning to use fork, which might prevent this Nx memory blowup. |
Thank you for the answer, but it seems that that option only works for TPU training? I tried it out anyways, but it didn't improve my situation. Any other pointers / ideas? |
I tried to manually rewrite the PyTorch Lightning code to use fork instead of spawn, but then the error "Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method" comes up:
|
mind be similar to #1769 |
So for others running into this: As a workaround, during Check out Setting it to a higher limit with However, it still seems too slow. It's only half as fast as it was using Torchbearer before. The more num_workers I use in the Dataloader, the slower the start of an epoch similar as described here in this issue: #1884 |
@mpaepper check again? |
Yes, thank you. It's resolved with the recent master additions 👍 |
🐛 Bug
Thank you for the great library! When migrating a larger project, I am running into memory issues, though, so maybe someone can help me out.
So, I have a pretty complicated DataSet which loads plenty of data and buffers the data into the CPU RAM as a numpy array.
I train using ddp and with num_workers = 6 in the dataloader. The training crashes my machine, because of CPU memory overflow. It works with num_workers = 0, but the higher the num_workers, the higher the memory consumption.
I figured out that this is much worse when using a large numpy array in the Dataset rather than a PyTorch tensor.
Unfortunately, I need numpy arrays, so I am asking you if there is anything I can do?
To Reproduce
I created a repository to reproduce this. It allows you to train a model on toy data using either a PyTorch tensor or a numpy array in the Dataset.
When running it with the PyTorch tensor the same amount of data uses 5GB of RAM while with Numpy it uses more than 30GB of RAM.
The higher the number of num_workers, the higher the RAM usage - it seems to leak when using numpy?
Code sample
https://github.com/mpaepper/reproduce_pytorch_lightning_memory_issues
Expected behavior
I would expect that numpy and PyTorch tensors should behave in the same way when using num_workers > 0, i.e. memory consumption is similar.
Environment
- GPU:
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- available: True
- version: 10.1
- numpy: 1.16.4
- pyTorch_debug: False
- pyTorch_version: 1.4.0
- pytorch-lightning: 0.7.5
- tensorboard: 1.14.0
- tqdm: 4.46.0
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.7.3
- version: Support for multiple val_dataloaders #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020
The text was updated successfully, but these errors were encountered: