-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TriviaQA LR scheduler code issue #37
Comments
Very good catch. Thanks, @apoorv2904. This is a bug in pytorch-lightning==0.6.0, and it has been fixed in later releases (Lightning-AI/pytorch-lightning#832). I would suggest you update to a more recent version of PTL (say version 0.7.5, not the most recent 0.7.6 because of a higher chance of bugs). If I am not mistaken, everything should work the same except loading a checkpoint, which requires |
Hi, When I update pl from 0.6.0 to 0.7.5 , I got:
When running
Any comment/suggestion? Thanks |
I am assuming you are using pytorch v.1.6, that's why pytorch-lightning is using native amp.
|
Thank you for your reply. |
Hi,
For single gpu training using the triviaqa code script, the learning rate goes to 0 in the first epoch itself.
Possible reasons: For a batchsize of 1, the global_step in pytorch_lightning increases with each batch of size 1 returned by the data_loader. It doesn't correspond to the number of optimizer steps. The LR scheduler was written with accumulated gradient batch size and thus the learning rate goes to 0 within the first epoch itself.
Thanks.
Apoorv
The text was updated successfully, but these errors were encountered: