-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to log hparams and metrics to tensorboard? #1228
Comments
Hi! thanks for your contribution!, great first issue! |
it seems to be duplicated, pls continue in #1225 |
Actually #1225 is not related. In that issue it's about providing a Namespace as hparams. Here, its about logging a metric such as validation accuracy together with a set of hparams. |
at which point would you like to log that? a) at each training step b) at the end of training c) something else? |
Did you try this: In training_step for example:
|
Yes I tried it. It seems that updating the hyperparams (writing a seond time) doesn't work. That's why I overwrite the original |
I want to achieve b). |
I am maybe something missing on your use-case... you are running some hyperparameter search with just one logger for all results? Not sure if storing all into single logger run is a good idea 🐰 |
Usually I am training and validating a model with different sets of hyperparameters in a loop. After each round the final output is often something like "val-loss". This would be the validation loss of the best epoch achieved in this particular round. After the last training round I am looking at the various sets of hyperparameters and compare them to their associated best validation loss. Tensorboard already provides tools for that: Visualize the results in TensorBoard's HParams plugin. In your |
I wonder, how do you compare different hyperparameter sets? Maybe there is a functionality that I didn't find... |
I am also seeing this same issue. No matter what I write to with |
So it appears like
Tip: I was only able to get metric values to display when they matched the tag name of a scalar I had logged previously in the log. |
umm. so weird. my tb logs hparams automatically when they are attached tk the module (self.hparams=hparams). and metrics are logged normally through the log key. did you try that? what exactly are you trying to do? |
Maybe a full simple example in Colab could be easier for this discussion? |
@SpontaneousDuck I tried your fix with |
Can you please give a simple example? |
These are the reasons we do this in the beginning of training... but agree this is not optimal. So, this seems to be a fundamental design problem with tensorboard unless I'm missing something obvious? FYI: |
@williamFalcon indeed both cases wont work with my solution. It's not really nice because I have all the epoch-wise metrics nicely organized in the tensorboard, but the hparams overview is in a separate .csv without all the tensorboard features. |
Just a question: If you log them multiple times, would they be overwritten on tensorboard? Because then we could probably also log them each epoch. |
From some comments I get the feeling we might be talking about 2 things here. Say I have a model which has the hyperparameter learning rate, it's a classifier, I want to train it for 100 epochs, I want to try out 2 different learning rates (say 1e-4 and 1e-5). On the one hand, I want to describe the training progress. I will do that with a training/validation loss and accuracy, which I write down for every epoch. Let me call these metrics epoch-wise metrics. On the other hand, I want to have an overview of which learning rate worked better. What I was talking about the whole time is the second case, the run metrics. |
Okay, so just to be sure: for your run metrics, you would like to log metrics and hparams in each step? I think, that's a bit overkill to add it as default behaviour. If it's just about the learning rate maybe #1498 could help you there. Otherwise I'm afraid I can't think of a generic implementation here, that won't add much overhead for most users :/ |
Hi, unfortunately it doesn't work yet.
The parameter appears as a column in the hparams tab but it doesnt get a value. |
Btw, if I use the high level API from tensorboard, everything works fine:
Is there a reason why you don't use the high level API in the tensoboard logger? |
Sorry, that was my mistake. I thought this was all handled by pytorch itself. Can you maybe try #1647 ? |
So theoretically it works. |
but how do you do this if you kill training prematurely or the cluster kills your job prematurely? your solution assumes the user ALWAYS gets to training_end |
The current It also looks like the Keras way of doing this is writing the hyperparameters at the beginning of training then writing the metrics and status message at the end. Not sure if that i possible in PyTorch right now though... The best solution to this I believe is just allowing the user more control. The code @mRcSchwering wrote above does this mostly. Having metrics be an optional parameter would solve this. If you call our modified
|
I found that if those many files written by |
Would it be possible to extend your pattern to collect |
@reactivetype actually, now I just got what @SpontaneousDuck meant. Here is an example. In the beginning I write all hyperparameter metrics that I want. In my case I use this logger. I do this again with a module base class which writes a hyperparameter metric
Then, it doesn't matter where you do this. You could also do this in the |
Does this mean we can further simplify the callback pattern in your examples? |
Yes, I'm not using callbacks anymore. Everything is inherited (a base module class). I usually prefer callbacks but in this case the |
Are there any plans to include this into lightning? Would be really nice to be able to use metrics inside of Tensorboard hparams without any hacking. |
yes, let's officially make this a fix! @mRcSchwering want to submit a PR? |
@mRcSchwering so it is solved, right? 🦝 |
@williamFalcon I just took a look at it. Wouldn't the pre-train routine be the place where the initial metrics have to be logged? One could add another attribute to the lightning module which will be added as metrics to the call. Is there a reason why |
@Borda yes, with #1228 (comment) it is possible to do it, currently. |
Thanks all for your contributions in solving this issue that I have also been struggling with. Would anyone be able to summarize what the current recommended approach is and maybe edit the documentation? https://pytorch-lightning.readthedocs.io/en/latest/loggers.html My current strategy is still quite hacky. |
For those trying to solve this, here's another proposed solution: #1778 (comment) |
Following the idea from @SpontaneousDuck I found the following way to bypass the problem without modifying the framework code: add a dummy call to class MyModule(LightningModule):
# set up 'test_loss' metric before fit routine starts
def on_fit_start(self):
metric_placeholder = {'test_loss': 0}
self.logger.log_hyperparams(self.hparams, metrics=metric_placeholder)
# at some method later
def test_epoch_end(self, outputs):
metrics_log = {'test_loss': something}
return {'log': metrics_log} The last metric will show in the TensorBoard HPARAMS table as desired, albeit the metric graph will include the first dummy point. |
Thanks @gwiener for the quick workaround. |
Likewise still looking for a solution here. And the solutions provided above did not work for me. |
we're discussing this in #2974 |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
How can I log metrics (e.g. validation loss of best epoch) together with the set of hyperparameters?
I have looked through the docs and through the code.
It seems like an obvious thing, so maybe I'm just not getting it.
Currently, the only way that I found was to extend the logger class:
And then I'm writing the hparams with metrics in a callback:
But that doesn't seem right.
Is there a better way to write some metric together with the hparams as well?
Environment
The text was updated successfully, but these errors were encountered: