-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The saved epoch number seems to be wrong? #296
Comments
the checkpoint does in fact not match the epoch (it's off by one). we fixed it in another PR which didn't land. Mind submitting a PR? |
no. but since you have fixed it in another PR, my PR may be not necessary? besides, im not very familiar with the code. would you please tell me which file I should help fixing? thank you. |
#128 is changing the epoch numbers, will soon be good to land. |
@HappyCtest It seems this is fixed now on master after merge of #128. |
It's great! Thank you for your contributions. |
The saved epoch number seems to be wrong. I don't know whether it is my fault.
Specifically, I first train my model for 2 epochs, with the following code:
During the first epoch,
epoch=0
. After the training of the first epoch, it shows:During the second epoch,
epoch=1
. After the training of the second epoch, it shows:At this moment, I save exp with the code:
and it gives:
And then, I want to continue my training with the following code:
It starts with
epoch=1
instead ofepoch=2
. Therefore, to reachnew_trainer
'smax_nb_epochs=3
, another 2 epochs will be implemented.Obviously, the epoch number in the saved exp is wrong. After the first two epochs, the saved epoch number should be 2. But it saved epoch=1, which causes the continuing training starts from epoch=1.
It really confused me. Looking forward to your help. Thanks.
The text was updated successfully, but these errors were encountered: