Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there no training_epoch_end? #1076

Closed
dscarmo opened this issue Mar 6, 2020 · 9 comments
Closed

Why is there no training_epoch_end? #1076

dscarmo opened this issue Mar 6, 2020 · 9 comments
Labels
feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement

Comments

@dscarmo
Copy link
Contributor

dscarmo commented Mar 6, 2020

🚀 Feature

If i want to calculate and log average statistics for the training epoch, it seems like there is no option to define a "training_epoch_end" in the LightningModule, as there is validation_epoch_end and test_epoch_end.

Motivation

Seems very intuitive to have this function. I know the on_epoch_end hook exists, but the "outputs" object with training history for that epoch is not available.

Pitch

Same behavior of validation_epoch_end and test_epoch_end in training.

Sorry if there is something like this already, just started to use Pl. (the master version).

@dscarmo dscarmo added feature Is an improvement or enhancement help wanted Open to be worked on labels Mar 6, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2020

Hi! thanks for your contribution!, great first issue!

@williamFalcon
Copy link
Contributor

didn't get around to it for this release. but feel free to PR it!
we do need it

@williamFalcon williamFalcon added need fix let's do it! approved to implement and removed need fix labels Mar 7, 2020
@gerardrbentley
Copy link
Contributor

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

@dscarmo
Copy link
Contributor Author

dscarmo commented Mar 12, 2020

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

I think it could be in the same way the others work, in my understanding they return a list of dicts, with each dict corresponding to the return of an epoch (return of trainining_step or training_end if exists).

@ethanwharris
Copy link
Member

The only gotcha that we need to watch out for is that all of the collected outputs need to be detached so they don't keep the gradient trees in memory. I would suggest writing a method that recursively traverses the output dictionary and creates a new one with the same elements but all detached, then we can apply this to each output before adding it to the dict. Will also need some good tests to make sure that there aren't any leaks :)

@failable
Copy link

The message here suggest using training_epoch_end, however, it is not called...

@thegyro
Copy link

thegyro commented Mar 24, 2020

Is training_epoch_end available on the latest release (0.7.2-dev)? The docs seem to suggest so. But I am not able to get it to work (tried logging by returning a 'log' dict).

@teristam
Copy link

teristam commented Mar 24, 2020

The message here suggest using training_epoch_end, however, it is not called...

Yes, this is quite confusing...And the training_epoch_end is refered here in the documentation as well.

@awaelchli
Copy link
Contributor

It is now here thanks to @jbschiratti ! #1357

@Borda Borda closed this as completed Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement
Projects
None yet
Development

No branches or pull requests

9 participants