Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects #2728

Closed
romesco opened this issue Jul 27, 2020 · 9 comments
Labels
question Further information is requested won't fix This will not be worked on

Comments

@romesco
Copy link
Contributor

romesco commented Jul 27, 2020

Question:

What are we going to consider best practice for visualizing images, embeddings, etc. to tensorboard when using pl.Train/EvalResult objects?

In light of #2651 and related PRs, what's the right way to do this?

Let's say we have a dataset of images, and we want to visualize a single batch of reconstructions once per epoch. I typically do this in validation_epoch_end(), using logger.experiment.add_image().

Code:

Let's say my code now looks like this:

 def validation_step(self, batch, batch_idx):
        step_dict = self._step(batch)
        result = pl.EvalResult(early_stop_on=step_dict['loss'],
                               checkpoint_on=step_dict['loss'],
                              )

        # logging
        result.log('avg_val_loss', step_dict['loss'],
                   on_epoch=True, reduce_fx=torch.mean)

        return result, step_dict

which works fine and is definitely much cleaner than the original method of returning multiple logging dicts. 😁

I now want to do something at the end of the validation loop so I specify:

def validation_epoch_end(self, outputs):
        img_batch = outputs[-1]['step_dict']['viz']['x_hat']
        img_batch = img_batch.view(img_batch.shape[0], *[1,32,32])
        grid = torchvision.utils.make_grid(img_batch)
        self.logger.experiment.add_image('x_hat', grid, self.current_epoch)

        avg_val_loss = torch.stack([x['avg_val_loss'] for x,y in outputs]).mean()
        return avg_val_loss

outputs in this case is a list of tuples where the first element is the EvalResult for each val step, and the second element contains step_dict which includes all losses and reconstructed x_hats for each val step.

Is there a better way? One potential downside to this is that outputs can eat up a significant chunk of memory if you're not careful.

What's your environment?

  • OS: [Ubuntu]
  • Packaging [pip]
  • Version [master]
@romesco
Copy link
Contributor Author

romesco commented Jul 27, 2020

I guess one other detail I will add is that it's unclear which of these has control over logging 'avg_val_loss', given that this used to be a role of validation_epoch_end() and its magic keyword 'avg_val_loss' in the returned dict.

Using the new way outlined above, I'm still getting this error:

 RuntimeWarning: The metric you returned None must be a `torch.Tensor` instance, checkpoint not saved HINT: what is the value of loss in validation_epoch_end()?
  warnings.warn(*args, **kwargs)

@romesco
Copy link
Contributor Author

romesco commented Jul 27, 2020

Looks like using
return {'loss': avg_val_loss}
at the end of validation_epoch_end() fixes that warning, but when combined with EvalResult, I don't really understand why it should be necessary to return that at all for checkpointing.

@celsofranssa
Copy link

It would be great an example possibly in Colab in which it is shown good practices logging the loss and other metrics as well as examples of samples (some images like in VAE, for instance) over train/val/test.

@romesco
Copy link
Contributor Author

romesco commented Jul 27, 2020

I have this all set up for a VAE, but I want to make sure I'm doing best practices with the latest updates. Once we come to a consensus on this, I can provide a colab link. 😁 There's also the bolts repo, which we could update!

@romesco romesco changed the title logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects Best Practices: logger.experiment.add_image() at end of epoch when using new simplified pl.Train/EvalResult objects Jul 28, 2020
@qiuhuaqi
Copy link

I'm trying to log some figures once per epoch in validation_epoch_end() but struggles to find a good practice.
In addition to model outputs, I also need to return some input data in (effectively your) step_dict for visualisation.

I don't want to accumulate a list in the validation loop as I only want one set of images per epoch. Any advice?
(I work with 3D medical images so memory is definitely a concern!)

@joshclancy
Copy link

I have this same question... what are the best practices for logging images? my usual wandb.log seems to no longer work, as it is now wandbLogger. I read the Train/EvalResults page but the documentation seems sparse here.

@borisdayma
Copy link
Contributor

@joshclancy You should still be able to import wandb and log images separately.
Otherwise you also have access to wandb at self.logger.experiment within your trainer.

@stale
Copy link

stale bot commented Nov 14, 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Nov 14, 2020
@stale stale bot closed this as completed Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested won't fix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants