Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SummaryWriter add_hparams should support adding new hyperparameters #39250

Open
awwong1 opened this issue May 29, 2020 · 7 comments
Open

SummaryWriter add_hparams should support adding new hyperparameters #39250

awwong1 opened this issue May 29, 2020 · 7 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix oncall: visualization Related to visualization in PyTorch, e.g., tensorboard

Comments

@awwong1
Copy link

awwong1 commented May 29, 2020

🐛 Bug

When calling SummaryWriter().add_hparams with new hyperparameters, keys that do not exist in the first call do not appear in the hyperparameter dashboard output.

To Reproduce

#!/usr/bin/env python3

from torch.utils.tensorboard import SummaryWriter

with SummaryWriter() as w:
    w.add_hparams({"key_A": 10}, {})
with SummaryWriter() as w:
    w.add_hparams({"key_B": 10}, {})

When viewing the Tensorboard summary writer output on http://localhost:6006/#hparams:

Trial_ID key_A
May29_09-27-46_mbp13/1590766066.254924 10.000
May29_09-27-46_mbp13/1590766066.2567558

Expected behavior

I would expect key_B to also appear in the output, with a blank value for the first row.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Debian GNU/Linux 10 (buster)
GCC version: (Debian 8.3.0-6) 8.3.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] Could not collect

Additional context

@gchanan gchanan added oncall: visualization Related to visualization in PyTorch, e.g., tensorboard enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels Jun 1, 2020
@AvivWn
Copy link

AvivWn commented Sep 25, 2020

I am experiencing this exact bug. Any news about that?
It is quite annoying to run a new logger, whenever adding a new parameter.

@Sushobhan04
Copy link

Any updates on this one ? I am facing this same issue and it seems like it has not been fixed in a year.

@AvivWn @awwong1 any workaround that you found ? I don't want to move to another logging tool, but this bug has been very annoying bug from tensorboard.

@sriveravi
Copy link

The bug still persists today.

tensorboard 2.5.0
pytorch 1.8.0

@LarsHill
Copy link

LarsHill commented Apr 9, 2022

The issue still persists and is quite annoying when comparing multiple runs with different hyperparameter setups. Is there any plan to fix this or is there a known workaround?

@sriveravi
Copy link

I believe the issue has been fixed. On the left, you need to scroll down and check the appropriate boxes. It will default to a smaller subset if you change the parameter sets.

@phisad
Copy link

phisad commented Apr 20, 2022

This still persists for me as well.

pytorch-lightning            1.5.9
torch                        1.10.1+cu113

@ghost
Copy link

ghost commented Nov 6, 2022

I am annoyed by this bug as well and wanted to understand it better.
I did a couple of tests, on tensorboard 2.10.1, pytorch 1.12.1, python 3.10.
Note: I wrote and edited that comment as I tested. Sorry about that, please jump to the end for conclusions.

Each time, I:

  • rm -rf runs
  • run the python script
  • start Tensorboard tensorboard --logdir runs (I stop Tensorboard before the next test).

Here are some results:

Case 1

from torch.utils.tensorboard import SummaryWriter

with SummaryWriter(log_dir="runs/ABC", filename_suffix="A") as w:
    w.add_hparams({"key_A": 10}, {"loss":1})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="B") as w:
    w.add_hparams({"key_B": 20}, {"loss":2})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="C") as w:
    w.add_hparams({"key_C": 30}, {"loss":3})

image
First surprise, I would have expected key_A, not B.
I don't understand why B comes up. That's weird, let's run it again.
image
Damn it.
Once more:
image
So, I'm already lost.
I confirm that the other keys are stored anyway by deleting folders and refreshing Tensorboard. From this last last run:
Delete A and B folders:
image
Delete A and C folders:
image

The data is there but doesn't show up. That points to a front-end issue, perhaps?
Let's do some other tests.

Case 2

from torch.utils.tensorboard import SummaryWriter

with SummaryWriter(log_dir="runs/ABC", filename_suffix="A") as w:
    w.add_hparams({"key_A": 10, "key_B": 10}, {"loss":1})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="B") as w:
    w.add_hparams({"key_B": 20}, {"loss":2})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="C") as w:
    w.add_hparams({"key_C": 30}, {"loss":3})

image
Another run:
image

Case 3

from torch.utils.tensorboard import SummaryWriter

with SummaryWriter(log_dir="runs/ABC", filename_suffix="A") as w:
    w.add_hparams({"key_A": 10, "key_B": 10, "key_C":10}, {"loss":1})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="B") as w:
    w.add_hparams({"key_B": 20}, {"loss":2})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="C") as w:
    w.add_hparams({"key_C": 30}, {"loss":3})

image
Another run...
image
And another one...
image

Conclusion

I don't get it at all!
But I hope these few tests may give more idea to someone maintaining Tensorboard...

EDIT: Follow-up

Looking at TB with a high verbosity, I notice that TB cyclically reload folders, but not always in the same order.
I suspect this non-deterministic order might explain why the same script can show different results. My assumption is that the first folder loaded during the very first loading defines the format of the table and thus which keys will be shown.
image

EDIT 2: I think I get it

This last one was easy to test, I should have started there.
First, start Tensorboard.
Second, run one experiment:

with SummaryWriter(log_dir="runs/ABC", filename_suffix="A") as w:
    w.add_hparams({"key_A": 10, "key_B": 10}, {"loss":1})

This is necessarily the first file loaded by TB, and thus defines the table schema.
If we then run the following, without turning TB off:

with SummaryWriter(log_dir="runs/ABC", filename_suffix="B") as w:
    w.add_hparams({"key_B": 20}, {"loss":2})
with SummaryWriter(log_dir="runs/ABC", filename_suffix="C") as w:
    w.add_hparams({"key_C": 30}, {"loss":3})

We see:
image

So it would appear that:

  1. When you start TB before generating experiments, the first experiment will define the schema of the table, probably as long as TB is running.
  2. When you start TB and open a folder of existing experiments, then the schema is defined by the first loaded experiment, which seems to be an unpredictable output of Rust loading files in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix oncall: visualization Related to visualization in PyTorch, e.g., tensorboard
Projects
None yet
Development

No branches or pull requests

7 participants