-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schedulers like get_linear_schedule_with_warmup need access to the length of the train dataset #1038
Comments
Well you can pass train dataset or loader in constructor, so it will be available as a field, any reason not to do so? |
I would support this change - as far as I know it should just change the order in which the dataset / optimizer methods are called w/o impacting anything else. Another note on these schedulers is that I believe you have to override |
Ah ok great! |
Fixed with #941, closing |
I think that this issue should not be closed yet. From what I can see in the PR for #941 it started to support granular LR stepping, but it does not cover usage for something like the |
@SkafteNicki pls ^^ |
Agree that PR #941 only covers the granular LR stepping.
where the trainer is then initialized with |
@SkafteNicki thanks for the response. That's almost exactly what I did: @lru_cache()
def total_steps(self):
return len(self.train_dataloader()) // self.hparams.accumulate_grad_batches * self.hparams.epochs
def configure_optimizers(self):
optimizer = AdamW(self.model.parameters(), lr=self.hparams.lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=self.hparams.warmup_steps,
num_training_steps=self.total_steps(),
)
return [optimizer], [{"scheduler": lr_scheduler, "interval": "step"}] If that's the "recommended" way of doing it then I'm fine with that :) |
Any chance we can revive this issue? In this case
And I agree with @SkafteNicki that it's bad to pass the trainer into |
I am not sure how deep this should be integrated into lightning, it is after all a feature for specific types of schedulers (those who rely on knowing the total number of steps) in the specific case where the user do not know (in advance) how many distributed processes it gets allocated. I will let this be up to the core team. That said the
Then we define a callback that will pass in the remaining variables that are available through the trainer in the
We of cause then initialize the trainer with |
@williamFalcon thoughts? |
I think that we can easily fix the first mentioned issue, if just |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
What is the progress now? put schedule, optimizer and model in Trainer.fit seems to be a more reasonable choice. |
@congchan I am not sure whether what I did is the best practice. I created a DataModule class and then I get the train dataloader size after loading the data. I pass the value into my LitModel class as init-param train_dataloader_size. Then I do the calculation within the model class to pass it to my scheduler. However, I am not sure how will dp/ddp/horovod change the calcuation. |
it doesn't work since optimizer will validate in https://github.com/PyTorchLightning/pytorch-lightning/blob/b1e3dcc607522b06e88d0cb086ab655b49c88b35/pytorch_lightning/trainer/optimizers.py#L178 |
Can this be re-opened? Why not do what @SokolovYaroslav is suggesting. If configure_optimizers() can be called after train_dataloader() then users can simply save the length in a local variable. |
@Borda huggingface's trainer passes the total training steps to the configure learning rate scheduler. https://github.com/huggingface/transformers/blob/b440b8d1ce404abc1fd953ec2e32e1817a4a4a77/src/transformers/trainer.py#L771 But this works only for dataloaders that are sized. I think calling configure optimizer after training loader is created as @SokolovYaroslav suggested makes sense for our lightning trainer. |
Any update? |
Any update on this? |
🐛 Bug
If you're using a lr scheduler that needs access to the number of batches in the train dataset like @huggingface's
get_linear_schedule_with_warmup
, there's currently no way to access the dataset inconfigure_optimizers()
because it looks like it is called beforetrain_dataloader()
.It would be nice to have some way to load the datasets before the optimizers and make the dataset available to other methods with something like
self.train_dataset = train_dataset
.Code sample:
The text was updated successfully, but these errors were encountered: