-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run full validation epoch before training #1715
Comments
To evaluate the performance of an existing model in your case, it is best practice to implement the test methods in the lightning module and then invoke the Trainer.test(). So I imagine your workflow will roughly be
There is a reason why the validation is tied to the training and cannot be run so easily from the outside. The validation is conceptually not the same as the test and does not reflect the true performance of a model, because we do things like early stopping based on validation loss etc. |
Hi @awaelchli Currently, the **: Currently you can't directly call ***: Currently the |
I'm agree with @simonepri. it's good to run full |
yes maybe we could think about the option |
I agree. It's good practice to run the validation before the train, and I'll surely use it! |
It's pretty straight forward, I guess I'm not the first to write this:)
|
@awaelchli yes, perfect. let’s do it that way. num_sanity_val_steps=-1 PR for 0.8.0? |
happy to make it but it would be easier to do after #1920. |
These changes still do not cover the case where we want to run full validation and also log the val result before training, right? So the case with |
@nmerty the sanity check is to make sure your code is functional and not crashing. the logging code also runs, it's just that the final values are not getting sent to the logger. we don't actually want to log anything, because that would lead to an unwanted offset in all figures of the logged keys that would follow in the normal epochs. If you want to validate the model in advance you can do trainer.validate(model)
trainer.fit(model) |
I am using the solution suggested by @awaelchli in my training script, passing a DataModule as the data source for both validation and training. The pre-training validation score is logged, but I noticed that the validation DataLoader (and validation workers) are being created twice when used in this manner. In my use case, the creation of workers is a relatively costly process, and I wonder if there is a way to log the validation score before training, while using DataModule, without causing the validation DataLoader to be intialized twice. |
That is probably because the sanity check validation also runs when calling fit, leading to the val loader being reset an additional time. Since you are doing this: trainer.validate(model)
trainer.fit(model) I would suggest setting |
Thanks for the suggestion, @awaelchli. I tested and this does not seem to be the cause. I see the same behavior with num_sanity_val_steps=0. It seems as though the Trainer instance does not persist the validation dataloader when running trainer.validate or ignores such a persisted dataloader when running trainer.fit subsequently. |
Lightning can't make the assumption that somehow the dataloader used during the validate call is going to be the same as in fit. In general, the trainer doesn't have any information whether it should reuse dataloaders from a previous stage or not. Each stage is separated, and this makes sense in most cases. If you really want to load the val dataloader only once, ever, I suggest you implement that directly in the datamodule like so: def val_dataloader(self):
if self._loaded_val_dataloader is None:
self._loaded_val_dataloader = ... # here the expensive way you create the loader
return self._loaded_val_dataloader
else:
# if we already created it in a previous call, reuse it.
return self._loaded_val_dataloader |
This makes a lot of sense, I will try it out. Could doing so be unsafe in the context of distributed data loading? I am currently using a single GPU, so it is of no immediate concern, just wondering whether I should be aware of a possible issue in case I scale up at a later time. |
@odedbd it won't be a problem. Since during each |
I tested the suggested method and the validation DataLoader is still recreated and new workers are initialized. I tried debugging the pytorch lightning code but wasn't able to pinpoint where the dataloader is reset. I did verify that the val_dataloader method was called twice, in the first time the _loaded_val_dataloader was None and in the second time it was the saved dataloader, which the method returned. |
if it returned the same dataloader, you are saying it is reinitializing the workers? can you share a reproducible script to verify this behavior? |
Yes, that's what I think I am seeing. Below is a repro script based on one of PL examples. You'll notice the worker init function of the val dataset is called a second time when the fit starts.
|
❓ Questions and Help
What is your question?
How can I manually trigger a full validation step before training?
I want to compute and log the validation metrics before I start the training (ideally also updating the progress bar dictionary).
The reason why I want to do this is that I am fine-tuning a pre-trained model, and I want to check the performances before training.
The text was updated successfully, but these errors were encountered: