Tensorboard log_hyperparams(params, metrics) seems not to have effect #1778

karapostK · 2020-05-11T12:03:22Z

🐛 Bug

Calling self.logger.log_hyperparams(hparams_dicts, metrics_dicts) in test_epoch_end doesn't have the desired effect. It should show the entries in the Hparams section with hparams and metrics specified but it shows nothing instead.
Looking at the code it seems to be caused by self.hparams and the pre-logging of the hyperparameters at the start of the training. In this way, calls to log_hyperparams won't be able to log the hyperparameters AND the metrics properly since they will clash with the previous log, hence, showing nothing.

To Reproduce

Try to log metrics with self.logger.log_hyperparams.

Code sample

def test_epoch_end(self, outputs):
    avg_recall = np.concatenate([x['recall'] for x in outputs]).mean()
    tensorboard_logs = {'test/avg_ndcg': avg_ndcg, "test/avg_recall": avg_recall}
    ## Log metrics
    self.logger.log_hyperparams(vars(self.params),tensorboard_logs)
    return tensorboard_logs

Expected behavior

Tensorboard should show me the section of Hparams with each entry composed by hyperparameters and metrics.

Environment

CUDA:
- GPU:
- available: False
- version: 10.2
Packages:
- numpy: 1.18.1
- pyTorch_debug: False
- pyTorch_version: 1.5.0
- pytorch-lightning: 0.7.5
- tensorboard: 2.2.1
- tqdm: 4.46.0
System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.2
- version: Evaluate reduce removal from validation_step #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020

Additional context

The text was updated successfully, but these errors were encountered:

github-actions · 2020-05-11T12:04:06Z

Hi! thanks for your contribution!, great first issue!

williamFalcon · 2020-05-11T15:14:09Z

yes, hparams seems to not be working for some reason. looking into it.

karapostK · 2020-05-11T15:27:53Z

I played around the issue and I think I may know the problem. When calling log_hyperparams the function adds events to the log file instead of overwrite the existent one (the one created in the pre-train routine). It follows that tensorboard sees one file with different sets of metrics, { } from the first log and the last added with log_hyperparams. This leads to hparams not showing it properly since Tensorboard wants to have all set of metrics to be the same across the experiments.

In order to see if works, you can just have "del self.hparams" at the beginning of your pl.lightiningmodule. This effectively jumps the pre-train logging and correctly shows hparams in tensorboard. Downside? Cannot load from checkpoint anymore ;)

williamFalcon · 2020-05-11T15:29:54Z

@justusschock is this related to the changes with initializing tb?

justusschock · 2020-05-12T07:47:47Z

Probably it is (we used the writers internal add_hparams for this, which seems to create a separate file). Hopefully fixed this in #1630 and #1647

williamFalcon · 2020-05-12T12:22:10Z

Confirmed this is fixed on master,

https://colab.research.google.com/drive/1K6Gxo99O6dEbzzj_lW8jAL74OvjojV9T

karapostK · 2020-05-12T15:22:16Z

The issue is not fixed yet, unfortunately.
Callinglog_hyperparams(vars(self.hparams),{"SOME_METRIC":2012}) in test_epoch_end won't log the metric in hparams. The problem is still the same. The inital log of the hparameters interfers with the last call and won't allow to log metrics in hparams!

karapostK · 2020-05-12T15:26:40Z

As you can see, no metrics are shown on tensorboard

def test_epoch_end(self, outputs):
    avg_ndcg = np.concatenate([x['ndcg'] for x in outputs]).mean()
    avg_recall = np.concatenate([x['recall'] for x in outputs]).mean()

    tensorboard_logs = {'test/avg_ndcg': avg_ndcg, "test/avg_recall": avg_recall}
    ## Log metrics
    #self.logger.log_metrics(tensorboard_logs)

    self.logger.log_hyperparams(vars(self.hparams),tensorboard_logs)
    return tensorboard_logs

williamFalcon · 2020-05-12T16:25:31Z

ummmm. this might be a design problem in TB.

there’s never a guarantee your model ends training (cluster interrupt, crash, etc..). In those instances you still want to know the hparams
the TB design assumes your training always completes for metrics...

so, looks like we have to get hacky to get this to work correctly?

karapostK · 2020-05-12T16:46:58Z

I think they may fix this in the future but I don't think in the short time :/
tensorflow/tensorboard#3597

singhay · 2020-05-21T23:14:31Z

@karapostK any luck with finding a fix/workaround ?

karapostK · 2020-05-22T08:54:51Z

@singhay I found one but it's not very nicey. In line 840 (run_pretrain_routine function) in Trainer.py there is this code:

    if self.logger is not None:
        # save exp to get started
        if hasattr(ref_model, "hparams"):
            self.logger.log_hyperparams(ref_model.hparams)

which logs the hparams too soon. Simply changing this to:

    if self.logger is not None:
        # save exp to get started
        if hasattr(ref_model, "hparams"):
            pass

does the job. However, remember to call log_hyperparams(hparams,metrics) somewhere in your code in order to properly log the metrics. EIther using the callback on_train_end() or on test_epoch_end.

versatran01 · 2020-06-06T18:44:40Z

Have the same problem and I think pl should not log hparams blindly in trainer. What's the point of logging hparams without a metric? It is been saved to disk anyway, so one could look there. If someone can only look at tensorboard (without access to the machine), then it is better to log hparams as text.

But now the question becomes, where to actually do the logging? In model checkpoint?

williamFalcon · 2020-06-06T21:00:12Z

most of the time we interrupt training before it completes... a lot of code won’t “complete” but you still need checkpoints and to know what you ran.

the problem is not lightning, but tensorboard for assuming that training always ends.

MilesCranmer · 2020-07-04T07:26:21Z

+1 Thanks everyone for looking into this. I've also been searching a way to use the hparams tab in tensorboard with lightning. I think @karapostK's solution needs an update: now one should just comment out self.logger.log_hyperparams(ref_model.hparams) in run_pretrain_routine in trainer.py.

Maybe one solution is to just have a Trainer option like pre_record_hyperparams which is default True, and when False, it turns off this line: https://github.com/PyTorchLightning/pytorch-lightning/blob/325852c6df93f749bb843bff1a3cdba41698722c/pytorch_lightning/trainer/trainer.py#L1077

we can change it to be:

        if self.logger is not None and self.pre_record_hyperparams:

and then the user will manually call log_hyperparams whenever they see fit, and also include the desired metrics.

For anybody trying to solve the same issue for their code, here is how I solved it with a hack:

Change if self.logger is not None: to if False: in run_pretrain_routine in trainer.py.
Before training, I have

checkpointer = ModelCheckpoint(filepath='best')

(and add it as a callback to trainer: Trainer(..., checkpoint_callback=checkpointer)).
3. After training, run:

logger.log_hyperparams(params=model.hparams, metrics={'val_loss': checkpointer.best_model_score.item()})
logger.save()

Then the model will appear in your hparams tab with val_loss as a metric:

edenlightning · 2020-08-03T19:37:52Z

Closing as this is a TB issue. @versatran01 feel free to reopen if you have any other issues!

vedal · 2021-04-16T12:33:14Z

Found an answer to this question here:
https://pytorch-lightning.readthedocs.io/en/latest/extensions/logging.html#logging-hyperparameters
(#6904)

Using a hack following Lightning-AI/pytorch-lightning#1778 (comment)

Isuxiz · 2023-03-02T16:12:48Z

Found an answer to this question here: https://pytorch-lightning.readthedocs.io/en/latest/extensions/logging.html#logging-hyperparameters (#6904)

And you, my friend, you are a real hero!

karapostK added bug Something isn't working help wanted Open to be worked on labels May 11, 2020

williamFalcon added the priority: 0 High priority task label May 11, 2020

williamFalcon self-assigned this May 11, 2020

awaelchli mentioned this issue May 11, 2020

Replace meta_tags.csv with hparams.yaml #1271

Merged

5 tasks

williamFalcon closed this as completed May 12, 2020

karapostK changed the title ~~Tensorboard log_hyperparams(params, metrics=None) seems not to have effect~~ Tensorboard log_hyperparams(params, metrics) seems not to have effect May 12, 2020

williamFalcon reopened this May 12, 2020

MilesCranmer mentioned this issue Jul 4, 2020

Add support to log hparams and metrics to tensorboard? #1228

Closed

edenlightning closed this as completed Aug 3, 2020

rolanddenis added a commit to PhaseFieldICJ/nnpf that referenced this issue Sep 17, 2022

Adding metric in tensorboard log

bb8ad5c

Using a hack following Lightning-AI/pytorch-lightning#1778 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorboard log_hyperparams(params, metrics) seems not to have effect #1778

Tensorboard log_hyperparams(params, metrics) seems not to have effect #1778

karapostK commented May 11, 2020

github-actions bot commented May 11, 2020

williamFalcon commented May 11, 2020

karapostK commented May 11, 2020

williamFalcon commented May 11, 2020

justusschock commented May 12, 2020 •

edited

Loading

williamFalcon commented May 12, 2020

karapostK commented May 12, 2020

karapostK commented May 12, 2020 •

edited

Loading

williamFalcon commented May 12, 2020

karapostK commented May 12, 2020

singhay commented May 21, 2020

karapostK commented May 22, 2020

versatran01 commented Jun 6, 2020 •

edited

Loading

williamFalcon commented Jun 6, 2020 •

edited

Loading

MilesCranmer commented Jul 4, 2020 •

edited

Loading

edenlightning commented Aug 3, 2020

vedal commented Apr 16, 2021

Isuxiz commented Mar 2, 2023

Tensorboard log_hyperparams(params, metrics) seems not to have effect #1778

Tensorboard log_hyperparams(params, metrics) seems not to have effect #1778

Comments

karapostK commented May 11, 2020

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

github-actions bot commented May 11, 2020

williamFalcon commented May 11, 2020

karapostK commented May 11, 2020

williamFalcon commented May 11, 2020

justusschock commented May 12, 2020 • edited Loading

williamFalcon commented May 12, 2020

karapostK commented May 12, 2020

karapostK commented May 12, 2020 • edited Loading

williamFalcon commented May 12, 2020

karapostK commented May 12, 2020

singhay commented May 21, 2020

karapostK commented May 22, 2020

versatran01 commented Jun 6, 2020 • edited Loading

williamFalcon commented Jun 6, 2020 • edited Loading

MilesCranmer commented Jul 4, 2020 • edited Loading

For anybody trying to solve the same issue for their code, here is how I solved it with a hack:

edenlightning commented Aug 3, 2020

vedal commented Apr 16, 2021

Isuxiz commented Mar 2, 2023

justusschock commented May 12, 2020 •

edited

Loading

karapostK commented May 12, 2020 •

edited

Loading

versatran01 commented Jun 6, 2020 •

edited

Loading

williamFalcon commented Jun 6, 2020 •

edited

Loading

MilesCranmer commented Jul 4, 2020 •

edited

Loading