Val_loss not available #321

Menion93 · 2019-10-07T11:35:13Z

Describe the bug
When I train my network, which has validation steps defined similar to the doc example

def validation_step(self, batch, batch_nb):
        x = torch.squeeze(batch['x'], dim=0).float()
        y = torch.squeeze(batch['y'], dim=0).long()

        output = self.forward(x)
        return {'batch_val_loss': self.loss(output, y),
                'batch_val_acc': accuracy(output, y)}

    def validation_end(self, outputs):
        avg_loss = torch.stack([x['batch_val_loss'] for x in outputs]).mean()
        avg_acc = torch.stack([x['batch_val_acc'] for x in outputs]).mean()

        return {'val_loss': avg_loss, 'val_acc': avg_acc}

with my cusotm EarlyStopCallback

early_stop_callback = EarlyStopping(monitor='val_loss', patience=5)

    tt_logger = TestTubeLogger(
        save_dir=log_dir,
        name="default",
        debug=False,
        create_git_tag=False
    )

    trainer = Trainer(logger=tt_logger,
                      row_log_interval=10,
                      checkpoint_callback=checkpoint_callback,
                      early_stop_callback=early_stop_callback,
                      gradient_clip_val=0.5,
                      gpus=gpus,
                      check_val_every_n_epoch=1,
                      max_nb_epochs=99999,
                      train_percent_check=train_frac,
                      log_save_interval=100,
                     )

the program cannot see my validation metrics:

Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,epoch,batch_nb,v_nb <class 'RuntimeWarning'>

In a previous release running on Windows (now I am on macOS), this behaviour was not happening. But in the previous version, TestTubeLogger was not present

Desktop (please complete the following information):

OS: macOS
Version: latest

williamFalcon · 2019-10-07T11:41:16Z

the early stop metrics come from “progress_bar” entry:

So, add a key “progress_bar” and val_loss in there

I’ll update the docs

Menion93 · 2019-10-07T13:07:44Z

Thank you this fixed my issue

def validation_end(self, outputs):
        avg_loss = torch.stack([x['batch_val_loss'] for x in outputs]).mean()
        avg_acc = torch.stack([x['batch_val_acc'] for x in outputs]).mean()

        return {
          'val_loss': avg_loss,
          'val_acc': avg_acc, 
          'progress_bar':{'val_loss': avg_loss, 'val_acc': avg_acc }}

Menion93 added the bug Something isn't working label Oct 7, 2019

williamFalcon closed this as completed Oct 7, 2019

rajarajanvakil mentioned this issue May 26, 2020

How to return a final val loss in trainer? #1942

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Val_loss not available #321

Val_loss not available #321

Menion93 commented Oct 7, 2019

williamFalcon commented Oct 7, 2019 •

edited

Loading

Menion93 commented Oct 7, 2019

Val_loss not available #321

Val_loss not available #321

Comments

Menion93 commented Oct 7, 2019

williamFalcon commented Oct 7, 2019 • edited Loading

Menion93 commented Oct 7, 2019

williamFalcon commented Oct 7, 2019 •

edited

Loading