Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support evaluation on validation and test set and updated MNIST example. #770

Closed
wants to merge 70 commits into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Jan 30, 2020

Before submitting

  • ✅ Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • ✅ Did you read the contributor guideline?
  • ❌ Did you make sure to update the docs?
  • ❌ Did you write any new necessary tests?

What does this PR do?

Fixes # 763.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Kinda 🙃

@ghost ghost requested review from Borda, williamFalcon and jeffling January 30, 2020 08:33
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would need some clarification...

@kuynzereb
Copy link
Contributor

I think that it is a bad idea to change the test() function by adding validation parameter. test() should be independent from the training and validation. And I don't really understand the purpose of this PR. If you want to run evaluation on the validation set you can just make all test_dataloader, test_step() and test_end() to be equal to val_dataloader, validation_step() and validation_end() respectively. Or is there some other intention?

Also it may be the documentation problem. When I just started to use PL it was quite unobvious to me how to evaluate the trained model. Will check it later.

@ghost
Copy link
Author

ghost commented Jan 31, 2020

@kuynzereb Let me fix those. AFAIK, trainer.test(model) can evaluate the model on test set with test_step() but there's no way to evaluate a model directly on validation set without running training. If there's already such function then this pull request will not be needed.

@ghost
Copy link
Author

ghost commented Jan 31, 2020

@Borda I've refactored the code. Can you take a look?

@kuynzereb
Copy link
Contributor

@xingzhaolee If you want to evaluate the model on the validation set you just need to define all test functions to be equal with val functions. That is, you can do something like:

def test_dataloader(self):
    return self.val_dataloader()

def test_step(self, *args, **kwargs):
    return self.validation_step(*args, **kwargs)

def test_end(self, *args, **kwargs):
    return self.validation_end(*args, **kwargs)

The point is that with test() you can evaluate on the whatever dataset you want. In particular on the validation set. You just need to define all test functions appropriately.

@ghost
Copy link
Author

ghost commented Jan 31, 2020

@kuynzereb Wouldn't it be better to have an option to run test on validation set rather than forcing user to copy and paste their validation code into test related functions.

Also in the ImageNet example, trainer.run_evaluation() is used which is wrong. trainer.test(model) should be used instead when no training is carried out.

@ghost
Copy link
Author

ghost commented Jan 31, 2020

@Borda any comments? If it's better to follow the way @kuynzereb mentioned then I'll close this pull request.

@kuynzereb
Copy link
Contributor

Also in the ImageNet example, trainer.run_evaluation() is used which is wrong. trainer.test(model) should be used instead when no training is carried out.

Yeah, you are totally right!

@kuynzereb Wouldn't it be better to have an option to run test on validation set rather than forcing user to copy and paste their validation code into test related functions.

Well, it may be indeed a nice option. I actually kinda like your idea to introduce trainer.validate(). But the current implementation looks quite clumsy for me. What do you think about the following refactoring:

  1. Let us introduce something like trainer.mode which can be equal to 'training', 'validating', 'testing' and let us remove old self.testing
  2. When we start training we assign trainer.mode = 'training'.
  3. Inside trainer.validate() we assign trainer.mode = 'validating'.
  4. Inside trainer.test() we assign trainer.mode = 'testing'
  5. Change run_evaluation(self, test) to run_evaluation(self). Inside run_evaluation() we will make different things depending on the trainer.mode. (Moreover we can rename run_evaluation to _run_evaluation so the user is aware that it is a private method).

@ghost
Copy link
Author

ghost commented Jan 31, 2020

@kuynzereb seems like a good way to encourage user to use the new .validate() or the existing .test() instead of run_evaluation() which may cause confusion for a lot of users (I guess that's why the ImageNet example got it wrong). I'll update it over the weekends! 😃

@Borda
Copy link
Member

Borda commented Jan 31, 2020

I would not add too much complexity, I like the idea with method validate, but with a data loader as a parameter so then you can use it generally... test or validation is the same thing in principle, you just dray different data basket... @williamFalcon ^^

@ghost
Copy link
Author

ghost commented Jan 31, 2020

@Borda sorry I misunderstood. Edited the comments. You meant something like this?

model = Model(hparams)
trainer = pl.Trainer()
trainer.test(model, model.val_dataloader)

hmm, both ways seems fine to me.

@Borda Borda added feature Is an improvement or enhancement information needed labels Jan 31, 2020
@Borda Borda added this to the 0.6.1 milestone Jan 31, 2020
@ghost
Copy link
Author

ghost commented Feb 3, 2020

@Borda Can you take a look at this? I'm separating validation and testing in case they have different evaluation methods. Will add on the option for new data loader if it's alright

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there more option than testing and validation?
consider enum the two cases, see https://docs.python.org/3/library/enum.html

@ghost
Copy link
Author

ghost commented Feb 4, 2020

@Borda for now I think only validation and testing. If there's anymore it can be added in the future. Updated to use enum.

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I like this trainer mode very much and with enum it much cleaner to see/read...

@Borda Borda added the ready PRs ready to be merged label Feb 14, 2020
@kuynzereb
Copy link
Contributor

kuynzereb commented Feb 14, 2020

If I understand correctly, there is an error. We set TrainerMode.TRAINING only during training initialization and not inside fit(). It will mean, that if we call first trainer.test() and then trainer.fit() it will work in testing mode instead of training.

And we cannot set TrainerMode.TRAINING in fit() because it is internal function being used for test() and validate(). And I can think of the following solution: let us rename fit() to _fit() and add new fit() which will first set mode to TrainerMode.TRAINING and then call _fit(). Accordingly, test() and validate() will have to call _fit().

@Borda
Copy link
Member

Borda commented Feb 14, 2020

@kuynzereb could you make a review and point out the bug in code... Thx

@Borda Borda removed the ready PRs ready to be merged label Feb 14, 2020
@ghost
Copy link
Author

ghost commented Feb 15, 2020

If I understand correctly, there is an error. We set TrainerMode.TRAINING only during training initialization and not inside fit(). It will mean, that if we call first trainer.test() and then trainer.fit() it will work in testing mode instead of training.

my bad. Didn't consider that scenario. Fixed.

@pep8speaks
Copy link

pep8speaks commented Feb 16, 2020

Hello @xingzhaolee! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-23 03:07:44 UTC

@awaelchli
Copy link
Contributor

@xingzhaolee The tests fail because the profiler overhead increased beyond the tolerance set in the tests, perhaps because of the additional validation logic.
@jeremyjordan do we need to increase the tolerance in tests? how was it chosen?

@awaelchli
Copy link
Contributor

maybe see what the timings are on master and compare with this branch to determine if the extra overhead is significant

@Borda
Copy link
Member

Borda commented Mar 20, 2020

@xingzhaolee @awaelchli just rerun the CI and everything is fine now...

@Borda Borda added the ready PRs ready to be merged label Mar 20, 2020
Copy link
Contributor

@tullie tullie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving but please consider my comments! :)

@awaelchli
Copy link
Contributor

Hi @xingzhaolee, when you rebase/merge master you will probably get docs build errors. Let me know if you need help resolving these :)

@@ -26,8 +26,8 @@
from logging import getLogger
_logger = getLogger("lightning")

from pytorch_lightning.trainer import Trainer # Initiaized first due to state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain this? it feels like requiring imports in a certain order would lead to bugs slipping into the codebase more easily

Copy link
Contributor

@awaelchli awaelchli Mar 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test and I think it is because pytorch_lightning.trainer.state.TrainerMode is included in a cyclic import.
For example, if I move TrainerMode to pytorch_lightning.overrides.data_parallel, then the import order highlighted here doesn't matter (tests don't break in either case).
I'm not saying it should go there but I would try to move it out of the import loop and make it so that the import order does not matter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's due to where state.py is located. Any suggestions on whether I should move it out like what @awaelchli said or should I keep it as it is for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah it'd be preferable if we didn't have to rely on import order for things to work properly. do you know where the import cycle is occuring?

# pytorch_lightning/trainer/state.py
(TrainerMode class is defined)

# pytorch_lightning/__init__.py
from pytorch_lightning.trainer import Trainer 
from pytorch_lightning.core import LightningModule

# pytorch_lightning/overrides/data_parallel.py
from pytorch_lightning.trainer.state import TrainerMode

# pytorch_lightning/trainer/evaluation_loop.py
from pytorch_lightning.trainer.state import TrainerMode

# pytorch_lightning/trainer/trainer.py
from pytorch_lightning.trainer.state import TrainerMode

just looking at the imports from this PR i'm not seeing it

Copy link
Contributor

@awaelchli awaelchli Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this:

  • Flip the imports @jeremyjordan highlighted (LightningModule first, then Trainer)
  • Open interactive python shell and import TrainerMode:
    from pytorch_lightning.trainer.state import TrainerMode
  • It tries to import LightningDistributedDataParallel, TrainerDataLoadingMixin and LightningModule but fails.

It must be a problem with imports in pytorch_lightning/__init__.py or pytorch_lightning/trainer/__init__.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the cycle:
When we do
from pytorch_lightning.trainer.state import TrainerMode it will run the init from pytorch_lightning, which imports Trainer. Trainer tries to import
from pytorch_lightning.trainer.state import TrainerMode and so on ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at it tomorrow and fix it so that import order is not relied on. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's either TrainerMode is moved out of trainer or TrainerMode needs to be imported first like:

from pytorch_lightning.trainer.state import TrainerMode
from pytorch_lightning.core import LightningModule
from pytorch_lightning.trainer import Trainer
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.core import data_loader

however, order still matters even in this case. the main issue lies with the import in overrides/data_parallel.py. any suggestions? @jeremyjordan @awaelchli @Borda

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stay outside and call it states as the trainer status will be added

To ensure you don't accidentally use test data to guide training decisions Lightning
makes running the test set deliberate.
To ensure you don't accidentally use validation or test data to guide training decisions Lightning
makes running the validation or test set deliberate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense.... validation should run within training.
This change here is wrong. @Borda @jeremyjordan

@williamFalcon
Copy link
Contributor

I don't really understand this PR and I think the functionality doesn't make sense.

Validation by definition is tied to training... it's a way of stopping training. It shouldn't be run separately.

It is NOT like .test(). Test is required to be run separately as a best practice.

This PR shouldn't be accepted as I'm not sure this is needed unless I'm missing something
@PyTorchLightning/core-contributors

@Borda Borda removed the ready PRs ready to be merged label Mar 23, 2020
@ghost
Copy link
Author

ghost commented Mar 23, 2020

if validation should not be allow to run without training then this PR won't be needed.

@ghost ghost closed this Mar 23, 2020
@williamFalcon
Copy link
Contributor

in what instance would you want to do that? maybe it is for some particular research use case?

@ghost
Copy link
Author

ghost commented Mar 23, 2020

It’s more of a general use case. Let’s say:

  1. During some model training the output goes through a sigmoid and a 0.5 threshold is set as prediction for validation. After training I might want to test is out with a different threshold, it would be easier if validation can be run again without training.
  2. I lost my training log but have my model and would like to rerun it on validation set without needing to change the test function for validation purposes.

But of course it’s possible to have both of those outside the Lightning module if that should be the case.

@daniilhayrapetyan
Copy link

Will also add one use case.
When doing fine-tuning, you might want to be able to compare the validation metrics with the same metrics before training was started.
I imagine it something like this

# set global_step = -1 so the logs are not rewritten by trainer.fit
trainer.global_step = -1
# log validation metrics on the original model
trainer.validate()
# restore global_step, not sure if it is needed
trainer.global_step = 0

trainer.fit(model)

I am new to PyTorchLightning, so my ideas might be wrong.
But still I need some way to compare the validation metrics before and during the training to evaluate the progress. And this solution seems an easy way to do that.

@williamFalcon
Copy link
Contributor

this already happens without having to do this.

set the number of sanity validation batches to -1 and it will log the full val before training

@daniilhayrapetyan
Copy link

OMG. That was the quickest response I have ever gotten on Github! Thanks.

@daniilhayrapetyan
Copy link

I am not sure if this is intended behaviour or I am doing something wrong.
I can't manage to run self.log for my LightningModule so that it records logs on sanity validation phase.

@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run_evaluation() does not work.
9 participants