Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with loading checkpoint of a model with embeddings #2359

Closed
narain1 opened this issue Jun 25, 2020 · 3 comments
Closed

Problem with loading checkpoint of a model with embeddings #2359

narain1 opened this issue Jun 25, 2020 · 3 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@narain1
Copy link

narain1 commented Jun 25, 2020

🐛 Bug

Unable to load from checkpoint for model with embeddings

Code sample

model arch

class Model(pl.LightningModule):
      def __init__(self, emb_szs):
            super().__init__()
            m = get_base()
            self.enc =  nn.Sequential(*list(m.children())[:-1], nn.Flatten())    
            nc = list(m.children())[-1].in_features
            self.head = nn.Sequential(nn.Linear(2*nc+25,512),Mish(),
                                nn.BatchNorm1d(512), nn.Dropout(0.5),nn.Linear(512,2))
            self.embs = nn.ModuleList([nn.Embedding(c, s) for c,s in emb_szs])
    
      def forward(self, xb, x_cat, x_cont):
             x1 = [e(x_cat[:,i]-1) for i,e in enumerate(self.embs)]
             x1 = torch.cat(x1, 1)
             x_img = self.enc(xb)
             x = torch.cat([x1, x_cont.unsqueeze(1)], 1)
             x = torch.cat([x, x_img], 1)
             return self.head(x)
  checkpoint_callback = ModelCheckpoint(
             filepath=os.path.join(os.getcwd(), 'model_dir'),
             #     save_top_k=True,
             verbose=True,
             monitor='val_loss',
             mode='min',
             prefix=''
             )

   trainer = Trainer(max_epochs=15, 
              early_stop_callback = early_stopping,
              gpus=1,
              gradient_clip_val=1.0,
              weights_save_path=os.getcwd(),
              checkpoint_callback = checkpoint_callback,
              num_sanity_val_steps=0
             )

the training loop has no problem but when I call trainer.test() a runtime error arrises

RuntimeError: Error(s) in loading state_dict for Model:
Unexpected key(s) in state_dict: "embs.0.weight", "embs.1.weight", "embs.2.weight", "embs.3.weight".

Expected behavior

As in the documentation It should have used the best checkpoint for test but loading checkpoint fails

Environment

  • CUDA:
    • GPU:
      • Tesla P100-PCIE-16GB
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.18.1
    • pyTorch_debug: False
    • pyTorch_version: 1.5.1
    • pytorch-lightning: 0.8.1
    • tensorboard: 2.2.2
    • tqdm: 4.45.0
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.6
    • version: Proposal for help #1 SMP Sat Jun 13 11:04:33 PDT 2020
@narain1 narain1 added bug Something isn't working help wanted Open to be worked on labels Jun 25, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@Borda
Copy link
Member

Borda commented Jun 28, 2020

@narain1 mind sharing a minimal running example, in your case there are several functions that cannot be traced...

@narain1
Copy link
Author

narain1 commented Jun 29, 2020

https://github.com/narain1/projects/blob/master/melanoma-lit-x2.ipynb

Above is the link to the jupyter notebook along with the stack trace

@narain1 narain1 closed this as completed Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

2 participants