You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Slurm and elastic training create the training processes per node outside of the lightning context. This means that when the fit function calls prepare_data, the assumption that it's only being called on proc 0 is broken and it gets called for each process.
This is an issue computational reasons (e.g. downloading a whole dataset) and for training stability if the data preparation process isn't deterministic.
🐛 Bug
Slurm and elastic training create the training processes per node outside of the lightning context. This means that when the fit function calls prepare_data, the assumption that it's only being called on proc 0 is broken and it gets called for each process.
This is an issue computational reasons (e.g. downloading a whole dataset) and for training stability if the data preparation process isn't deterministic.
See calling code here:
https://github.com/PyTorchLightning/pytorch-lightning/blob/7c7e50ca4702a5b35bc1b80d44bca7606552093a/pytorch_lightning/trainer/trainer.py#L825
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Expected prepare_data to only be called once per node.
The text was updated successfully, but these errors were encountered: