Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorboard hyperparameters don't update #1217

Closed
gunthergl opened this issue Mar 23, 2020 · 8 comments · Fixed by #2342
Closed

tensorboard hyperparameters don't update #1217

gunthergl opened this issue Mar 23, 2020 · 8 comments · Fixed by #2342
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@gunthergl
Copy link

gunthergl commented Mar 23, 2020

🐛 Bug

Given two sets of HPARAMS, h_1 and h_2 where h_1 is a strict subset of h_2.

  1. If you run pytorch lightning with parameters h_1, then h_2, the additional parameters from h_2 are not shown in tensorboard
  2. If you run pytorch lightning with parameters h_2, then h_1, the missing parameters from h_1 are shown empty in tensorboard

Case 2 is fine, Case 1 is not.

I already issued this at tensorboard but was directed back here again.

To Reproduce

  1. Run code
  2. start tensorboard --logdir=lightning_logs in same directory
  3. Go to HPARAMS in website
  4. See only layer_1_dim

Code sample

import pytorch_lightning as pl
from argparse import ArgumentParser
import torch

class LitMNIST(pl.LightningModule):
    def __init__(self, hparams):
        super(LitMNIST, self).__init__()
        self.hparams = hparams
        self.layer_1 = torch.nn.Linear(28 * 28, self.hparams.layer_1_dim)

    def forward(self, *args, **kwargs):
        pass



if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    parser.add_argument('--another_hyperparameter', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(another_hyperparameter=10, layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

Change that "solves" the problem: First call net with both parameters

import pytorch_lightning as pl
from argparse import ArgumentParser
import torch

class LitMNIST(pl.LightningModule):
    def __init__(self, hparams):
        super(LitMNIST, self).__init__()
        self.hparams = hparams
        self.layer_1 = torch.nn.Linear(28 * 28, self.hparams.layer_1_dim)

    def forward(self, *args, **kwargs):
        pass



if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    parser.add_argument('--another_hyperparameter', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(another_hyperparameter=10, layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass


    parser = ArgumentParser()
    parser.add_argument('--layer_1_dim', type=int, default=10)
    args = parser.parse_args()

    # print(args)
    ## > Namespace(layer_1_dim=10)
    model = LitMNIST(hparams=args)
    trainer = pl.Trainer()
    try:
        trainer.fit(model)
    except:
        pass

Expected behavior

  1. Run code
  2. start tensorboard --logdir=lightning_logs in same directory
  3. Go to HPARAMS in website
  4. See layer_1_dim and another_hyperparameter
    • but another_hyperparameter empty in version0

Hackaround:

  • Run second code. The trick is to call the net with all hyperparameters first, then tensorboard gets another_hyperparameter

Environment

  • PyTorch Version (e.g., 1.0): py3.7_cuda101_cudnn7_0
  • OS (e.g., Linux): Win10
  • How you installed PyTorch: conda
  • Build command you used (if compiling from source): -
  • Python version: 3.7.0
  • CUDA/cuDNN version: 10.1
  • GPU models and configuration: GTX1650
  • Any other relevant information:
    • conda list for pytorch: pytorch-lightning 0.7.1 pypi_0 pypi
@gunthergl gunthergl added bug Something isn't working help wanted Open to be worked on labels Mar 23, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@Borda
Copy link
Member

Borda commented Mar 30, 2020

@gunthergl may you consider drafting a PR with fix?

@gunthergl
Copy link
Author

I sadly do not know where exactly this happens plus have very low free capacity atm.
After I have to get my method running first, it's not my top priority to have it fixed right now. As long as i keep track internally what I did it does not matter, it is more much more convenience than crucial right now.

@Borda
Copy link
Member

Borda commented Mar 30, 2020

@jeffling @MattPainter01 may you have a look?

@MattPainter01
Copy link
Contributor

I have also seen this issue and assumed like @gunthergl that it was a tensorboard issue.

I'm working from home at the moment and it's difficult to do any Lightning work from here. @Borda
I can look into this properly in a couple of weeks (when we can travel again) if it is not fixed.

@Jie-Qiao
Copy link

Same issues here and I have to delete all the previous logs that have different parameters and restart tensorbord.

@awwong1
Copy link

awwong1 commented May 29, 2020

I ran into this issue as well, it is unlikely that this is a PyTorch lightning issue.
The following code replicates the problem using only SummaryWriter:

#!/usr/bin/env python3

from torch.utils.tensorboard import SummaryWriter

with SummaryWriter() as w:
    w.add_hparams({"key_A": 10}, {})
with SummaryWriter() as w:
    w.add_hparams({"key_B": 10}, {})

When viewing the Tensorboard summary writer output on http://localhost:6006/#hparams:

Trial_ID key_A
May29_09-27-46_mbp13/1590766066.254924 10.000
May29_09-27-46_mbp13/1590766066.2567558

@edenlightning
Copy link
Contributor

Closing this as this is an issue in tensorboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants