Why is there no training_epoch_end? #1076

dscarmo · 2020-03-06T20:15:53Z

🚀 Feature

If i want to calculate and log average statistics for the training epoch, it seems like there is no option to define a "training_epoch_end" in the LightningModule, as there is validation_epoch_end and test_epoch_end.

Motivation

Seems very intuitive to have this function. I know the on_epoch_end hook exists, but the "outputs" object with training history for that epoch is not available.

Pitch

Same behavior of validation_epoch_end and test_epoch_end in training.

Sorry if there is something like this already, just started to use Pl. (the master version).

github-actions · 2020-03-06T20:16:33Z

Hi! thanks for your contribution!, great first issue!

williamFalcon · 2020-03-07T00:21:53Z

didn't get around to it for this release. but feel free to PR it!
we do need it

gerardrbentley · 2020-03-07T21:10:23Z

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

dscarmo · 2020-03-12T12:51:48Z

Do you think more people would want a list of every full batch output (so the results of each training_step / training_step_end if implemented) or the accumulated batch outputs?

I think it could be in the same way the others work, in my understanding they return a list of dicts, with each dict corresponding to the return of an epoch (return of trainining_step or training_end if exists).

ethanwharris · 2020-03-12T12:59:49Z

The only gotcha that we need to watch out for is that all of the collected outputs need to be detached so they don't keep the gradient trees in memory. I would suggest writing a method that recursively traverses the output dictionary and creates a new one with the same elements but all detached, then we can apply this to each output before adding it to the dict. Will also need some good tests to make sure that there aren't any leaks :)

failable · 2020-03-15T10:40:46Z

The message here suggest using training_epoch_end, however, it is not called...

thegyro · 2020-03-24T11:56:40Z

Is training_epoch_end available on the latest release (0.7.2-dev)? The docs seem to suggest so. But I am not able to get it to work (tried logging by returning a 'log' dict).

teristam · 2020-03-24T16:57:28Z

The message here suggest using training_epoch_end, however, it is not called...

Yes, this is quite confusing...And the training_epoch_end is refered here in the documentation as well.

awaelchli · 2020-04-04T05:54:00Z

It is now here thanks to @jbschiratti ! #1357

dscarmo added feature Is an improvement or enhancement help wanted Open to be worked on labels Mar 6, 2020

williamFalcon added need fix let's do it! approved to implement and removed need fix labels Mar 7, 2020

rpatrik96 mentioned this issue Mar 17, 2020

training_epoch needs to return a "loss" key in the dict #1175

Closed

Borda closed this as completed Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there no training_epoch_end? #1076

Why is there no training_epoch_end? #1076

dscarmo commented Mar 6, 2020

github-actions bot commented Mar 6, 2020

williamFalcon commented Mar 7, 2020

gerardrbentley commented Mar 7, 2020

dscarmo commented Mar 12, 2020

ethanwharris commented Mar 12, 2020

failable commented Mar 15, 2020

thegyro commented Mar 24, 2020

teristam commented Mar 24, 2020 •

edited

Loading

awaelchli commented Apr 4, 2020

Why is there no training_epoch_end? #1076

Why is there no training_epoch_end? #1076

Comments

dscarmo commented Mar 6, 2020

🚀 Feature

Motivation

Pitch

github-actions bot commented Mar 6, 2020

williamFalcon commented Mar 7, 2020

gerardrbentley commented Mar 7, 2020

dscarmo commented Mar 12, 2020

ethanwharris commented Mar 12, 2020

failable commented Mar 15, 2020

thegyro commented Mar 24, 2020

teristam commented Mar 24, 2020 • edited Loading

awaelchli commented Apr 4, 2020

teristam commented Mar 24, 2020 •

edited

Loading