Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Early stopping conditioned on metric val_loss which is not available. Pass in or modify your EarlyStopping callback to use any of the following: #11534

Closed
Drow999 opened this issue Jan 18, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@Drow999
Copy link

Drow999 commented Jan 18, 2022

🐛 Bug

Hello guys,
Does anybody have any idea why the early stopping was triggered? I checked the value of val_loss and it is not equal to 0. And I found the issue #490 #492 solved a similar problem, but that is pl 0.5, and the verison I used is 1.4.5, also I tried 1.5.8 before, the result is the same.

To Reproduce

    def validation_step(self, batch, batch_idx):
        print('batch size:', len(batch['pose_body']))
        drec = self(batch['pose_body'].view(-1, 36))

        loss = self._compute_loss(batch, drec)
        print(loss)
        val_loss = loss['unweighted_loss']['loss_total']
        print('val_loss', val_loss)
        #if self.renderer is not None and self.global_rank == 0 and batch_idx % 500==0 and np.random.rand()>0.5:
        #    out_fname = makepath(self.work_dir, 'renders/vald_rec_E{:03d}_It{:04d}_val_loss_{:.2f}.png'.format(self.current_epoch, batch_idx, val_loss.item()), isfile=True)
        #    self.renderer([batch, drec], out_fname = out_fname)
        #    dgen = self.vp_model.sample_poses(self.vp_ps.logging.num_bodies_to_display)
        #    out_fname = makepath(self.work_dir, 'renders/vald_gen_E{:03d}_I{:04d}.png'.format(self.current_epoch, batch_idx), isfile=True)
        #    self.renderer([dgen], out_fname = out_fname)
        progress_bar = {'v2v': val_loss}
        return {'val_loss': c2c(val_loss), 'progress_bar': progress_bar, 'log': progress_bar}

    def validation_epoch_end(self, outputs):
        metrics = {'val_loss': np.nanmean(np.concatenate([v['val_loss'] for v in outputs])) }
        print('metrice:', metrics)
        print('output:' , outputs)
        if self.global_rank == 0:

            self.text_logger('Epoch {}: {}'.format(self.current_epoch, ', '.join('{}:{:.2f}'.format(k, v) for k, v in metrics.items())))
            self.text_logger('lr is {}'.format([pg['lr'] for opt in self.trainer.optimizers for pg in opt.param_groups]))
        metrics = {k: torch.as_tensor(v) for k, v in metrics.items()}
        progress_bar = {'val_loss': metrics['val_loss']}
        return {'val_loss': metrics['val_loss'], 'progress_bar': progress_bar, 'log': `metrics}`
  early_stopping:
    monitor: val_loss
    min_delta: 0.0
    patience: 100
    verbose: True
    mode: min

> 

### Environment

- PyTorch Lightning Version (e.g., 1.4.5):
- PyTorch Version (e.g., 1.7.1):
- Python version (e.g., 3.7):
- OS (e.g., Linux):
- CUDA/cuDNN version:10.1
- How you installed PyTorch (`conda`,):

@Drow999 Drow999 added the bug Something isn't working label Jan 18, 2022
@Drow999
Copy link
Author

Drow999 commented Jan 18, 2022

Console screenshot

1
Screenshot from 2022-01-18 18-19-33
Screenshot from 2022-01-18 18-19-46

@Drow999
Copy link
Author

Drow999 commented Jan 19, 2022

I solved it by add
self.log('val_loss', val_loss)
in validation_step() to mark the loss I want to monitor...

@Drow999 Drow999 closed this as completed Jan 19, 2022
@oomq
Copy link

oomq commented Aug 19, 2022

I solved it by add self.log('val_loss', val_loss) in validation_step() to mark the loss I want to monitor...

Thank you!!!

@faizan1234567
Copy link

faizan1234567 commented Aug 9, 2023

I did the same but mine is giving an error. I am using lighting 1.6.0, I have logged "valid/loss" but it does not recognize it in the early stopping callback function. @Drow999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants