Test results not logged to tensorboard, since 0.7.3, this worked in 0.7.1 #1447

WSzP · 2020-04-10T16:16:14Z

🐛 Bug

Test results are not logged to TensorBoard. With the exact same code, version 0.7.1 logged them flawlessly. Also, with the exact same code, validation and train results are logged. So I assumed the issue is with the test.

To Reproduce

Run test() step with a model that has TensorBoard logging.
logger = TensorBoardLogger(LOG_DIR, name=NAME)

Code sample

def validation_step(self, val_batch, batch_idx):
    [...]
    return {'val_loss': loss}

def validation_epoch_end(self, outputs):
    avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
    tensorboard_logs = {'val_loss': avg_loss}
    return {'avg_val_loss': avg_loss, 'log': tensorboard_logs} #this works!

def test_step(self, test_batch, batch_idx):
    [...]           
    return {'test_loss': loss}

def test_epoch_end(self, outputs):
    avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean()
    tensorboard_logs = {'MSE': avg_loss}
    print(f"Test Mean Squared Error (MSE): {avg_loss}")  #this works!                         
    return {'avg_test_loss': avg_loss, 'log': tensorboard_logs} #the issue might be here

Expected behavior

The expected behavior is for tensorboard_logs to contain the MSE, but when I open them in TensorBoard they don't contain MSE, only the val_loss and train_loss. The exact same code used to work in 0.7.1. So I believe some changes in 0.7.3 produced this bug.
The print works, so the correct value is printed, but I assume there is some issue when you return 'log': tensorboard_logs.

Environment

PyTorch Version (e.g., 1.0): 1.4.0
OS (e.g., Linux): Windows 10 x64
How you installed PyTorch: pip
Build command you used (if compiling from source): n/a
Python version: 3.7.7
Any other relevant information: The full code can be found here: https://github.com/WSzP/uxml-ecommerce/blob/master/train-uxml-basic-matrix-factorization.ipynb

The text was updated successfully, but these errors were encountered:

github-actions · 2020-04-10T16:22:13Z

Hi! thanks for your contribution!, great first issue!

williamFalcon · 2020-04-11T14:48:20Z

ummm. i thought we fixed this in 0.7.3. can you post a colab to reproduce?

WSzP · 2020-04-11T15:36:03Z

ummm. i thought we fixed this in 0.7.3. can you post a colab to reproduce?

Thank you so much for the quick reply.
https://colab.research.google.com/drive/1bexbN61LpWVZ106glFhAVF7Vz1jXQr1L
Hopefully, this works. (I'm using Google Colab for the first time, I'm more of a localhost first -> deploy to AWS/Azure kind of guy.)

Borda · 2020-04-11T18:58:31Z

@WSzP we probably need also the dataset...

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-9-0af11722af78> in <module>()
     20                      callbacks=[TestingCallbacks()]
     21                      )                
---> 22 trainer.fit(model)

3 frames
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
    426         own_fid = False
    427     else:
--> 428         fid = open(os_fspath(file), "rb")
    429         own_fid = True
    430 

FileNotFoundError: [Errno 2] No such file or directory: '/content/uxm_train.npz'

williamFalcon · 2020-04-11T19:33:20Z

you can just use fake data generators with the right dimensions. this is just about logging anyhow

WSzP · 2020-04-11T19:46:30Z

you can just use fake data generators with the right dimensions. this is just about logging anyhow

Ok, I just changed the code to generate a random sparse matrix. Thanks for the idea.

WSzP · 2020-04-11T20:46:38Z

When I run it, I see the test score on the board...

I only see train_loss and val_loss, but not the test score.

Borda · 2020-04-11T21:33:18Z

I think that I see the problem, it comes with introduces agg_and_log_metrics for the logger...
https://github.com/PyTorchLightning/pytorch-lightning/blob/3f1e4b953f84ecdac7dada0c6b57d908efc9c3d3/pytorch_lightning/trainer/logging.py#L74
in this case, it is called and saved to accumulator till it receives another step or the logger is terminated which activated the flush results...
https://github.com/PyTorchLightning/pytorch-lightning/blob/3f1e4b953f84ecdac7dada0c6b57d908efc9c3d3/pytorch_lightning/loggers/base.py#L232-L237
the solution is to replicate the same action to the logger.save()

Borda · 2020-04-11T22:13:33Z

@WSzP pls try this fix

! pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/bugfix/flush-logger.zip -U

WSzP · 2020-04-11T22:38:59Z

It works like a charm. Thank you so much @Borda. Cheers!

Borda · 2020-04-12T05:53:14Z

Let's keep it open till the fix is merged to master...

WSzP added bug Something isn't working help wanted Open to be worked on labels Apr 10, 2020

WSzP changed the title ~~Test results not logged to tensorboard, since 1.7.3 (validation and train results are).~~ Test results not logged to tensorboard, since 0.7.3 (validation and train results are). Apr 10, 2020

WSzP changed the title ~~Test results not logged to tensorboard, since 0.7.3 (validation and train results are).~~ Test results not logged to tensorboard, since 0.7.3, this worked in 0.7.1 Apr 10, 2020

Borda mentioned this issue Apr 11, 2020

fix flushing loggers #1459

Merged

5 tasks

WSzP closed this as completed Apr 11, 2020

Borda added this to the 0.7.4 milestone Apr 12, 2020

Borda reopened this Apr 12, 2020

yukw777 mentioned this issue Apr 14, 2020

Test metrics is not being reported to TensorBoard since 0.7.2 #1435

Closed

williamFalcon closed this as completed in #1459 Apr 15, 2020

Borda modified the milestones: 0.7.4, v0.7.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test results not logged to tensorboard, since 0.7.3, this worked in 0.7.1 #1447

Test results not logged to tensorboard, since 0.7.3, this worked in 0.7.1 #1447

WSzP commented Apr 10, 2020 •

edited

Loading

github-actions bot commented Apr 10, 2020

williamFalcon commented Apr 11, 2020

WSzP commented Apr 11, 2020 •

edited

Loading

Borda commented Apr 11, 2020

williamFalcon commented Apr 11, 2020

WSzP commented Apr 11, 2020

WSzP commented Apr 11, 2020

Borda commented Apr 11, 2020

Borda commented Apr 11, 2020

WSzP commented Apr 11, 2020

Borda commented Apr 12, 2020

Test results not logged to tensorboard, since 0.7.3, this worked in 0.7.1 #1447

Test results not logged to tensorboard, since 0.7.3, this worked in 0.7.1 #1447

Comments

WSzP commented Apr 10, 2020 • edited Loading

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

github-actions bot commented Apr 10, 2020

williamFalcon commented Apr 11, 2020

WSzP commented Apr 11, 2020 • edited Loading

Borda commented Apr 11, 2020

williamFalcon commented Apr 11, 2020

WSzP commented Apr 11, 2020

WSzP commented Apr 11, 2020

Borda commented Apr 11, 2020

Borda commented Apr 11, 2020

WSzP commented Apr 11, 2020

Borda commented Apr 12, 2020

WSzP commented Apr 10, 2020 •

edited

Loading

WSzP commented Apr 11, 2020 •

edited

Loading