-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unrecognized val_loss
metric
#923
Comments
Hey, thanks for your contribution! Great first issue! |
The EarlyStopping looks for the key @williamFalcon This is a recurring issue that users have with the provided examples. They should probably get updated so that the default EarlyStopping just works from the beginning without producing warnings. |
@bsridatta Actually, I just saw that you are linking an old documentation in your post. The examples are already updated. See here, the fix 2 I suggested is already done: |
It got fixed here #524 |
Thanks for the quick response and clarification.
I understood this wrong. I have used the example I mentioned earlier in November and did even doubt that it might have changed. It's clear now, thanks! |
RuntimeWarnings due to unrecognized
val_loss
metricpytorch_lightning callbacks are unable to recognize
val_loss
fromvalidation_step()
To Reproduce
Run CoolModel from
Steps to reproduce the behavior:
`/opt/conda/lib/python3.6/site-packages/pytorch_lightning/callbacks/pt_callbacks.py:314: RuntimeWarning: Can save best model only with val_loss available, skipping.
' skipping.', RuntimeWarning)
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/callbacks/pt_callbacks.py:144: RuntimeWarning: Early stopping conditioned on metric
val_loss
which is not available. Available metrics are: loss,avg_val_lossRuntimeWarning)`
Code sample
Just the Minimal Example
Output
Excluding the print from pip install and training tqdm prints
Expected behavior
The 'val_loss' metric should be recognized when a dic with key
val_loss
is returned byvalidation_step()
Environment
Collecting environment information...
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: None
OS: Debian GNU/Linux 9 (stretch)
GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
CMake version: version 3.7.2
Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Versions of relevant libraries:
[pip] msgpack-numpy==0.4.4.3
[pip] numpy==1.18.1
[pip] numpydoc==0.9.2
[pip] pytorch-ignite==0.3.0
[pip] pytorch-lightning==0.6.0
[pip] pytorch-pretrained-bert==0.6.2
[pip] pytorch-transformers==1.1.0
[pip] torch==1.4.0
[pip] torchaudio==0.4.0a0+719bcc7
[pip] torchtext==0.5.0
[pip] torchvision==0.4.2
[conda] blas 1.0 mkl
[conda] cpuonly 1.0 0 pytorch
[conda] mkl 2019.3 199
[conda] mkl-service 2.0.2 py36h7b6447c_0
[conda] mkl_fft 1.0.12 py36ha843d7b_0
[conda] mkl_random 1.0.2 py36hd81dba3_0
[conda] pytorch 1.4.0 py3.6_cpu_0 [cpuonly] pytorch
[conda] pytorch-ignite 0.3.0 pypi_0 pypi
[conda] pytorch-lightning 0.6.0 pypi_0 pypi
[conda] pytorch-pretrained-bert 0.6.2 pypi_0 pypi
[conda] pytorch-transformers 1.1.0 pypi_0 pypi
[conda] torchaudio 0.4.0 py36 pytorch
[conda] torchtext 0.5.0 pypi_0 pypi
[conda] torchvision 0.4.2 pypi_0 pypi
Additional context
Something interesting is you cannot reproduce this when you run CoolSystem.
You won't get any warning because
validation_end
returns tensorboard logs withval_loss
as a key. If you remove this log you will able to catch the bug.For what its worth, I was trying to make a PyTorch Lightning based Kernel in Kaggle and WandB as logger. After trying "a lot" to find a mistake in my code, I realized that this warning is not shown every time. Once you run and get the warning and re-run the block (I was using Kaggle Kernel Notebook) it disappears. I assume that it is reading from some cache? Since the val_loss is something for the first epoch(maybe something from last epoch or log) and stay 0 for the rest. I am not familiar with the internal working of PL but I suspect there is some mix up between metrics that are logged and metrics returned by lightning methods.
The text was updated successfully, but these errors were encountered: