-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorboard log_hyperparams(params, metrics) seems not to have effect #1778
Comments
Hi! thanks for your contribution!, great first issue! |
yes, hparams seems to not be working for some reason. looking into it. |
I played around the issue and I think I may know the problem. When calling log_hyperparams the function adds events to the log file instead of overwrite the existent one (the one created in the pre-train routine). It follows that tensorboard sees one file with different sets of metrics, { } from the first log and the last added with log_hyperparams. This leads to hparams not showing it properly since Tensorboard wants to have all set of metrics to be the same across the experiments. In order to see if works, you can just have "del self.hparams" at the beginning of your pl.lightiningmodule. This effectively jumps the pre-train logging and correctly shows hparams in tensorboard. Downside? Cannot load from checkpoint anymore ;) |
@justusschock is this related to the changes with initializing tb? |
Confirmed this is fixed on master, https://colab.research.google.com/drive/1K6Gxo99O6dEbzzj_lW8jAL74OvjojV9T |
The issue is not fixed yet, unfortunately. |
As you can see, no metrics are shown on tensorboard
|
ummmm. this might be a design problem in TB.
so, looks like we have to get hacky to get this to work correctly? |
I think they may fix this in the future but I don't think in the short time :/ |
@karapostK any luck with finding a fix/workaround ? |
@singhay I found one but it's not very nicey. In line 840 (run_pretrain_routine function) in Trainer.py there is this code:
which logs the hparams too soon. Simply changing this to:
does the job. However, remember to call log_hyperparams(hparams,metrics) somewhere in your code in order to properly log the metrics. EIther using the callback on_train_end() or on test_epoch_end. |
Have the same problem and I think pl should not log hparams blindly in trainer. What's the point of logging hparams without a metric? It is been saved to disk anyway, so one could look there. If someone can only look at tensorboard (without access to the machine), then it is better to log hparams as text. But now the question becomes, where to actually do the logging? In model checkpoint? |
most of the time we interrupt training before it completes... a lot of code won’t “complete” but you still need checkpoints and to know what you ran. the problem is not lightning, but tensorboard for assuming that training always ends. |
+1 Thanks everyone for looking into this. I've also been searching a way to use the hparams tab in tensorboard with lightning. I think @karapostK's solution needs an update: now one should just comment out Maybe one solution is to just have a Trainer option like we can change it to be: if self.logger is not None and self.pre_record_hyperparams: and then the user will manually call For anybody trying to solve the same issue for their code, here is how I solved it with a hack:
checkpointer = ModelCheckpoint(filepath='best') (and add it as a callback to trainer:
Then the model will appear in your hparams tab with val_loss as a metric: |
Closing as this is a TB issue. @versatran01 feel free to reopen if you have any other issues! |
Found an answer to this question here: |
Using a hack following Lightning-AI/pytorch-lightning#1778 (comment)
And you, my friend, you are a real hero! |
🐛 Bug
Calling self.logger.log_hyperparams(hparams_dicts, metrics_dicts) in test_epoch_end doesn't have the desired effect. It should show the entries in the Hparams section with hparams and metrics specified but it shows nothing instead.
Looking at the code it seems to be caused by self.hparams and the pre-logging of the hyperparameters at the start of the training. In this way, calls to log_hyperparams won't be able to log the hyperparameters AND the metrics properly since they will clash with the previous log, hence, showing nothing.
To Reproduce
Try to log metrics with self.logger.log_hyperparams.
Code sample
Expected behavior
Tensorboard should show me the section of Hparams with each entry composed by hyperparameters and metrics.
Environment
- GPU:
- available: False
- version: 10.2
- numpy: 1.18.1
- pyTorch_debug: False
- pyTorch_version: 1.5.0
- pytorch-lightning: 0.7.5
- tensorboard: 2.2.1
- tqdm: 4.46.0
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.2
- version: Evaluate reduce removal from validation_step #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020
Additional context
The text was updated successfully, but these errors were encountered: