-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TrainResult/EvalResult does not log properly with on_step=True and on_epoch=True #2972
Comments
I have a similar problem
yields nothing in tensorboard log dir, except 1 data point at step 49, it's really weird |
ummmm... that's weird. i'll check this out |
To be more specific, here is the SS of the eval step (of my code above). either accuracy or loss of EvalResult has the same problem. Compare to this, in step tr_loss, (the correct one):In another case:I also found this inconsistency, in TrainResult that should record the step values but only log a few of them, otherwise in EvalResult, it was correct to log the step values (Flipping case to the one that I post here). Also sometimes, it did not log anything at all (another case). |
* add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add step metrics * add step metrics
* add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add step metrics * add step metrics
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
Here the minimal code in Colab: here
OR:
Code sample
Expected behavior
The$n_batch\times epochs$ items (the number of step), but it looks like the same as number of epoch (only few of them).
step_val_loss
graph on Tensorboard should haveEnvironment
You can get the script and run it with this PL version 0.9.0.rc12, I used the master version here.
Additional context
In another experiment, I found in the
step_tr_loss
also not logging properly (looks like on_epoch=True with different values)Hope someone can help this problem. Or is there any logical error in mycode?
because, I always upgrade the PL version to master :D,
The text was updated successfully, but these errors were encountered: