You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training with reload_dataloaders_every_epoch and I've noticed it instantiates an extra DataLoader before training for which nothing is run. This is an issue for me as I am training with chunks that get loaded every epoch and it is messing with the order I load them in especially if I reload a checkpoint; it would be an issue for people that generate a new dataset every epoch as they waste computation. The tqdm bar also keeps the information of the first, discarded DataLoader (in the screenshot, the number of iterations is the same for both whereas they should be different sizes)
To Reproduce
Run the code sample below, which runs for one epoch and displays a message every time a DataLoader is created.
A DataLoader gets instantiated a first time line 286 in training_loop.py outside of the epoch loop (that's the usual time it gets instantiated when not reloading every epoch. Then when using reload_dataloaders_every_epoch another one is created at the start of every epoch line 386, inside the loop, so for the first epoch there's an extra one.
Only one dataloader should be created; two are. The tqdm bar should show 128 iterations as that is the dataset size the second time; but it shows 64 instead (I added the sleep(0.1) to leave time to observe that)
Environment
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collectepoch_end
GPU models and configuration: GPU 0: GeForce RTX 2070 with Max-Q Design
Nvidia driver version: 435.21
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.18.1
[pip3] pytorch-lightning==0.7.1
[pip3] torch==1.4.0
[pip3] torchvision==0.4.2
[conda] Could not collect
Additional context
The text was updated successfully, but these errors were encountered:
🐛 Bug
I am training with reload_dataloaders_every_epoch and I've noticed it instantiates an extra DataLoader before training for which nothing is run. This is an issue for me as I am training with chunks that get loaded every epoch and it is messing with the order I load them in especially if I reload a checkpoint; it would be an issue for people that generate a new dataset every epoch as they waste computation. The tqdm bar also keeps the information of the first, discarded DataLoader (in the screenshot, the number of iterations is the same for both whereas they should be different sizes)
To Reproduce
Run the code sample below, which runs for one epoch and displays a message every time a DataLoader is created.
A DataLoader gets instantiated a first time line 286 in
training_loop.py
outside of the epoch loop (that's the usual time it gets instantiated when not reloading every epoch. Then when usingreload_dataloaders_every_epoch
another one is created at the start of every epoch line 386, inside the loop, so for the first epoch there's an extra one.Code sample
Expected behavior
Only one dataloader should be created; two are. The tqdm bar should show 128 iterations as that is the dataset size the second time; but it shows 64 instead (I added the sleep(0.1) to leave time to observe that)
Environment
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collectepoch_end
GPU models and configuration: GPU 0: GeForce RTX 2070 with Max-Q Design
Nvidia driver version: 435.21
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.18.1
[pip3] pytorch-lightning==0.7.1
[pip3] torch==1.4.0
[pip3] torchvision==0.4.2
[conda] Could not collect
Additional context
The text was updated successfully, but these errors were encountered: