Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the bug when use self.save_hyperparameters() #85

Closed
qianjinhao opened this issue Jul 6, 2021 · 2 comments
Closed

the bug when use self.save_hyperparameters() #85

qianjinhao opened this issue Jul 6, 2021 · 2 comments
Assignees

Comments

@qianjinhao
Copy link

qianjinhao commented Jul 6, 2021

I followed the tutorial to train my own video classification model, but this bug appeared when I wanted to use the saved model for inference。

Traceback (most recent call last): File "eval.py", line 181, in <module> main() File "eval.py", line 122, in main model = MyLightingModule.load_from_checkpoint(checkpoint_path) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 199, in _load_model_state model = cls(**_cls_kwargs) TypeError: __init__() missing 1 required positional argument: 'args'

Then they checked the information on the Internet and they said they could add this code.
Lightning-AI/pytorch-lightning#2909

However, I added this line of code to retrain and there was a bug. Can someone tell me why?

raceback (most recent call last): File "train_frame.py", line 630, in <module> main() File "train_frame.py", line 610, in main train(args) File "train_frame.py", line 617, in train trainer.fit(classification_module, data_module) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit self._run(model) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run self.dispatch() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch self.accelerator.start_training(self) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training self.training_type_plugin.start_training(trainer) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training self._results = trainer.run_stage() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage return self.run_train() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train self.train_loop.run_training_epoch() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 584, in run_training_epoch self.trainer.run_evaluation(on_epoch=True) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1006, in run_evaluation self.evaluation_loop.on_evaluation_end() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 102, in on_evaluation_end self.trainer.call_hook('on_validation_end', *args, **kwargs) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1223, in call_hook trainer_hook(*args, **kwargs) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/callback_hook.py", line 227, in on_validation_end callback.on_validation_end(self, self.lightning_module) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 249, in on_validation_end self.save_checkpoint(trainer) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 298, in save_checkpoint self._save_top_k_checkpoint(trainer, monitor_candidates) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 669, in _save_top_k_checkpoint self._update_best_and_save(current, trainer, monitor_candidates) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 730, in _update_best_and_save self._save_model(trainer, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 449, in _save_model self._do_save(trainer, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 460, in _do_save trainer.save_checkpoint(filepath, self.save_weights_only) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/properties.py", line 330, in save_checkpoint self.checkpoint_connector.save_checkpoint(filepath, weights_only) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 392, in save_checkpoint self.trainer.accelerator.save_checkpoint(_checkpoint, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 516, in save_checkpoint self.training_type_plugin.save_checkpoint(checkpoint, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 256, in save_checkpoint atomic_save(checkpoint, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/cloud_io.py", line 64, in atomic_save torch.save(checkpoint, bytesbuffer) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 379, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 484, in _save pickler.dump(obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x7ff195ab3b80>: attribute lookup <lambda> on pytorchvideo.models.resnet failed Exception ignored in: <function tqdm.__del__ at 0x7ff1b151cdc0> Traceback (most recent call last): File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1145, in __del__ File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1299, in close File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1492, in display File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1148, in __str__ File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1450, in format_dict TypeError: cannot unpack non-iterable NoneType object

@qianjinhao
Copy link
Author

Can you give an example to test the use case with the trained ckpt?

@kalyanvasudev
Copy link
Contributor

Please add this line self.save_hyperparameters(args) to your PytorchLightning model here - https://github.com/facebookresearch/pytorchvideo/blob/master/tutorials/video_classification_example/train.py#L69

This should save and load args automatically when working with checkpoints.

Please re-open the issue if the problem persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants