Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects #2728

romesco · 2020-07-27T21:56:07Z

Question:

What are we going to consider best practice for visualizing images, embeddings, etc. to tensorboard when using pl.Train/EvalResult objects?

In light of #2651 and related PRs, what's the right way to do this?

Let's say we have a dataset of images, and we want to visualize a single batch of reconstructions once per epoch. I typically do this in validation_epoch_end(), using logger.experiment.add_image().

Code:

Let's say my code now looks like this:

 def validation_step(self, batch, batch_idx):
        step_dict = self._step(batch)
        result = pl.EvalResult(early_stop_on=step_dict['loss'],
                               checkpoint_on=step_dict['loss'],
                              )

        # logging
        result.log('avg_val_loss', step_dict['loss'],
                   on_epoch=True, reduce_fx=torch.mean)

        return result, step_dict

which works fine and is definitely much cleaner than the original method of returning multiple logging dicts. 😁

I now want to do something at the end of the validation loop so I specify:

def validation_epoch_end(self, outputs):
        img_batch = outputs[-1]['step_dict']['viz']['x_hat']
        img_batch = img_batch.view(img_batch.shape[0], *[1,32,32])
        grid = torchvision.utils.make_grid(img_batch)
        self.logger.experiment.add_image('x_hat', grid, self.current_epoch)

        avg_val_loss = torch.stack([x['avg_val_loss'] for x,y in outputs]).mean()
        return avg_val_loss

outputs in this case is a list of tuples where the first element is the EvalResult for each val step, and the second element contains step_dict which includes all losses and reconstructed x_hats for each val step.

Is there a better way? One potential downside to this is that outputs can eat up a significant chunk of memory if you're not careful.

What's your environment?

OS: [Ubuntu]
Packaging [pip]
Version [master]

The text was updated successfully, but these errors were encountered:

romesco · 2020-07-27T22:25:34Z

I guess one other detail I will add is that it's unclear which of these has control over logging 'avg_val_loss', given that this used to be a role of validation_epoch_end() and its magic keyword 'avg_val_loss' in the returned dict.

Using the new way outlined above, I'm still getting this error:

 RuntimeWarning: The metric you returned None must be a `torch.Tensor` instance, checkpoint not saved HINT: what is the value of loss in validation_epoch_end()?
  warnings.warn(*args, **kwargs)

romesco · 2020-07-27T22:29:49Z

Looks like using
return {'loss': avg_val_loss}
at the end of validation_epoch_end() fixes that warning, but when combined with EvalResult, I don't really understand why it should be necessary to return that at all for checkpointing.

celsofranssa · 2020-07-27T22:32:21Z

It would be great an example possibly in Colab in which it is shown good practices logging the loss and other metrics as well as examples of samples (some images like in VAE, for instance) over train/val/test.

romesco · 2020-07-27T22:40:23Z

I have this all set up for a VAE, but I want to make sure I'm doing best practices with the latest updates. Once we come to a consensus on this, I can provide a colab link. 😁 There's also the bolts repo, which we could update!

qiuhuaqi · 2020-07-29T10:05:43Z

I'm trying to log some figures once per epoch in validation_epoch_end() but struggles to find a good practice.
In addition to model outputs, I also need to return some input data in (effectively your) step_dict for visualisation.

I don't want to accumulate a list in the validation loop as I only want one set of images per epoch. Any advice?
(I work with 3D medical images so memory is definitely a concern!)

joshclancy · 2020-09-18T17:26:21Z

I have this same question... what are the best practices for logging images? my usual wandb.log seems to no longer work, as it is now wandbLogger. I read the Train/EvalResults page but the documentation seems sparse here.

borisdayma · 2020-10-15T01:23:27Z

@joshclancy You should still be able to import wandb and log images separately.
Otherwise you also have access to wandb at self.logger.experiment within your trainer.

stale · 2020-11-14T03:01:49Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

celsofranssa · 2020-11-25T19:51:19Z

Maybe this should work in validation and testing:

https://github.com/PyTorchLightning/pytorch-lightning/blob/471ca375babad9093abf60683a8d0647ac33d4a8/pytorch_lightning/core/lightning.py#L347-L352

romesco added the question Further information is requested label Jul 27, 2020

romesco mentioned this issue Jul 27, 2020

Logging loss, metric, and figure to TensorBoard (over train, test and validation step) #2714

Closed

romesco changed the title ~~logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects~~ Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects Jul 28, 2020

tbenst mentioned this issue Sep 1, 2020

save image to disk with epoch number #3300

Closed

stale bot added the won't fix This will not be worked on label Nov 14, 2020

stale bot closed this as completed Nov 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects #2728

Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects #2728

romesco commented Jul 27, 2020 •

edited

Loading

romesco commented Jul 27, 2020 •

edited

Loading

romesco commented Jul 27, 2020

celsofranssa commented Jul 27, 2020

romesco commented Jul 27, 2020

qiuhuaqi commented Jul 29, 2020

joshclancy commented Sep 18, 2020

borisdayma commented Oct 15, 2020

stale bot commented Nov 14, 2020

celsofranssa commented Nov 25, 2020

Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects #2728

Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects #2728

Comments

romesco commented Jul 27, 2020 • edited Loading

Question:

Code:

What's your environment?

romesco commented Jul 27, 2020 • edited Loading

romesco commented Jul 27, 2020

celsofranssa commented Jul 27, 2020

romesco commented Jul 27, 2020

qiuhuaqi commented Jul 29, 2020

joshclancy commented Sep 18, 2020

borisdayma commented Oct 15, 2020

stale bot commented Nov 14, 2020

celsofranssa commented Nov 25, 2020

romesco commented Jul 27, 2020 •

edited

Loading

romesco commented Jul 27, 2020 •

edited

Loading