-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support evaluation on validation and test set and updated MNIST example. #770
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would need some clarification...
I think that it is a bad idea to change the Also it may be the documentation problem. When I just started to use PL it was quite unobvious to me how to evaluate the trained model. Will check it later. |
@kuynzereb Let me fix those. AFAIK, |
@Borda I've refactored the code. Can you take a look? |
@xingzhaolee If you want to evaluate the model on the validation set you just need to define all
The point is that with |
@kuynzereb Wouldn't it be better to have an option to run test on validation set rather than forcing user to copy and paste their validation code into test related functions. Also in the ImageNet example, |
@Borda any comments? If it's better to follow the way @kuynzereb mentioned then I'll close this pull request. |
Yeah, you are totally right!
Well, it may be indeed a nice option. I actually kinda like your idea to introduce
|
@kuynzereb seems like a good way to encourage user to use the new |
I would not add too much complexity, I like the idea with method validate, but with a data loader as a parameter so then you can use it generally... test or validation is the same thing in principle, you just dray different data basket... @williamFalcon ^^ |
@Borda sorry I misunderstood. Edited the comments. You meant something like this?
hmm, both ways seems fine to me. |
@Borda Can you take a look at this? I'm separating validation and testing in case they have different evaluation methods. Will add on the option for new data loader if it's alright |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there more option than testing and validation?
consider enum the two cases, see https://docs.python.org/3/library/enum.html
@Borda for now I think only validation and testing. If there's anymore it can be added in the future. Updated to use enum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I like this trainer mode very much and with enum it much cleaner to see/read...
If I understand correctly, there is an error. We set And we cannot set |
@kuynzereb could you make a review and point out the bug in code... Thx |
my bad. Didn't consider that scenario. Fixed. |
Hello @xingzhaolee! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-03-23 03:07:44 UTC |
@xingzhaolee The tests fail because the profiler overhead increased beyond the tolerance set in the tests, perhaps because of the additional validation logic. |
maybe see what the timings are on master and compare with this branch to determine if the extra overhead is significant |
@xingzhaolee @awaelchli just rerun the CI and everything is fine now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving but please consider my comments! :)
Hi @xingzhaolee, when you rebase/merge master you will probably get docs build errors. Let me know if you need help resolving these :) |
pytorch_lightning/__init__.py
Outdated
@@ -26,8 +26,8 @@ | |||
from logging import getLogger | |||
_logger = getLogger("lightning") | |||
|
|||
from pytorch_lightning.trainer import Trainer # Initiaized first due to state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain this? it feels like requiring imports in a certain order would lead to bugs slipping into the codebase more easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick test and I think it is because pytorch_lightning.trainer.state.TrainerMode is included in a cyclic import.
For example, if I move TrainerMode to pytorch_lightning.overrides.data_parallel, then the import order highlighted here doesn't matter (tests don't break in either case).
I'm not saying it should go there but I would try to move it out of the import loop and make it so that the import order does not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's due to where state.py
is located. Any suggestions on whether I should move it out like what @awaelchli said or should I keep it as it is for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah it'd be preferable if we didn't have to rely on import order for things to work properly. do you know where the import cycle is occuring?
# pytorch_lightning/trainer/state.py
(TrainerMode class is defined)
# pytorch_lightning/__init__.py
from pytorch_lightning.trainer import Trainer
from pytorch_lightning.core import LightningModule
# pytorch_lightning/overrides/data_parallel.py
from pytorch_lightning.trainer.state import TrainerMode
# pytorch_lightning/trainer/evaluation_loop.py
from pytorch_lightning.trainer.state import TrainerMode
# pytorch_lightning/trainer/trainer.py
from pytorch_lightning.trainer.state import TrainerMode
just looking at the imports from this PR i'm not seeing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this:
- Flip the imports @jeremyjordan highlighted (LightningModule first, then Trainer)
- Open interactive python shell and import TrainerMode:
from pytorch_lightning.trainer.state import TrainerMode
- It tries to import
LightningDistributedDataParallel
,TrainerDataLoadingMixin
andLightningModule
but fails.
It must be a problem with imports in pytorch_lightning/__init__.py
or pytorch_lightning/trainer/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found the cycle:
When we do
from pytorch_lightning.trainer.state import TrainerMode
it will run the init from pytorch_lightning
, which imports Trainer. Trainer tries to import
from pytorch_lightning.trainer.state import TrainerMode
and so on ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at it tomorrow and fix it so that import order is not relied on. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's either TrainerMode
is moved out of trainer
or TrainerMode
needs to be imported first like:
from pytorch_lightning.trainer.state import TrainerMode
from pytorch_lightning.core import LightningModule
from pytorch_lightning.trainer import Trainer
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.core import data_loader
however, order still matters even in this case. the main issue lies with the import in overrides/data_parallel.py
. any suggestions? @jeremyjordan @awaelchli @Borda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would stay outside and call it states
as the trainer status will be added
To ensure you don't accidentally use test data to guide training decisions Lightning | ||
makes running the test set deliberate. | ||
To ensure you don't accidentally use validation or test data to guide training decisions Lightning | ||
makes running the validation or test set deliberate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense.... validation should run within training.
This change here is wrong. @Borda @jeremyjordan
I don't really understand this PR and I think the functionality doesn't make sense. Validation by definition is tied to training... it's a way of stopping training. It shouldn't be run separately. It is NOT like .test(). Test is required to be run separately as a best practice. This PR shouldn't be accepted as I'm not sure this is needed unless I'm missing something |
if validation should not be allow to run without training then this PR won't be needed. |
in what instance would you want to do that? maybe it is for some particular research use case? |
It’s more of a general use case. Let’s say:
But of course it’s possible to have both of those outside the Lightning module if that should be the case. |
Will also add one use case. # set global_step = -1 so the logs are not rewritten by trainer.fit
trainer.global_step = -1
# log validation metrics on the original model
trainer.validate()
# restore global_step, not sure if it is needed
trainer.global_step = 0
trainer.fit(model) I am new to PyTorchLightning, so my ideas might be wrong. |
this already happens without having to do this. set the number of sanity validation batches to -1 and it will log the full val before training |
OMG. That was the quickest response I have ever gotten on Github! Thanks. |
I am not sure if this is intended behaviour or I am doing something wrong. |
Before submitting
What does this PR do?
Fixes # 763.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Kinda 🙃