Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added MultiEpochsDataLoader #140

Merged
merged 1 commit into from
May 5, 2020
Merged

added MultiEpochsDataLoader #140

merged 1 commit into from
May 5, 2020

Conversation

yoniaflalo
Copy link

@yoniaflalo yoniaflalo commented May 5, 2020

Hi.

I have added a feature that is called MultiEpochsDataLoader. When using the data loader of pytorch, at the beginning of every epoch, we have to wait a lot and the training speed is very low from the first iteration. It is because the pytorch data loader is reinitialized from scratch.

In this feature, we do not waste time, and just the first initialization of the the dataloader at the first epoch takes time, but for the next epochs, the first iteration of every new epoch is as fast as the iterations in the middle of an epoch.

Example when using the MultiEpochsDataLoader:

First epoch:

Screen Shot 2020-05-05 at 14 36 55

Next epochs
Screen Shot 2020-05-05 at 14 37 33

This can save more than 10 seconds per epoch, that is almost an hour of training when training a network with 300 epochs. (for example, training on 8 V100 is quite expensive, so saving an hour every time we need to train is quite nice)

I have tested the feature on a training of ecaresnetlight

@chris-ha458
Copy link
Contributor

This code looks like it would be valuable in the PyTorch code base itself!
Have you considered sending a PR or opening an issue there too?

@yoniaflalo
Copy link
Author

No, I did not sent a PR in the PyTorch code base. I do not know this code base good enough, and PR in PyTorch code base takes several months to get merged. Maybe I could consider doing it. But for now, I think it would be good to put it in this repository, since it is the best repository that exists for image classification training.

@rwightman
Copy link
Collaborator

thanks

@rwightman rwightman merged commit 3b72ebf into huggingface:master May 5, 2020
@mrT23
Copy link
Contributor

mrT23 commented May 8, 2020

@vrandme
this has been proposed and discussed in PyTorch merge requests
pytorch/pytorch#15849 (comment)

@bryant1410
Copy link
Contributor

@vrandme
this has been proposed and discussed in PyTorch merge requests
pytorch/pytorch#15849 (comment)

IIUC, this is different. Re-using the workers can be just keeping them alive. This implementation goes a bit beyond: workers are also gonna pre-fetch the data for the next epoch instead of starting over the data loading pipeline (because the sampler is infinite).

@AhmedAhmedEG
Copy link

AhmedAhmedEG commented Dec 26, 2023

Does this implementation messes up the shuffling in Dataloader? I always thought the Dataloader reinitializes itself for the shuffling and a couple other operations.

Are there's no any drawbacks from using this method? It's very weird PyTorch devs never considered doing such a crucial thing.

guoriyue pushed a commit to guoriyue/pytorch-image-models that referenced this pull request May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants