Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly fix random seed with pytorch lightning? #1565

Closed
belskikh opened this issue Apr 22, 2020 · 10 comments · Fixed by #1572
Closed

How to properly fix random seed with pytorch lightning? #1565

belskikh opened this issue Apr 22, 2020 · 10 comments · Fixed by #1572
Assignees
Labels
question Further information is requested

Comments

@belskikh
Copy link

What is your question?

Hello guys
I wonder how to fix seed to get reproducibility of my experiments

Right now I'm using this function before the start of the training

def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

But it doesn't work.
I run training in DDP mode if it is somehow important.

Thanks in advance!

What's your environment?

  • OS: Ubuntu 18.04
  • Packaging: pip
  • Version: 0.7.1
@belskikh belskikh added the question Further information is requested label Apr 22, 2020
@kumuji
Copy link
Contributor

kumuji commented Apr 22, 2020

Also have the same problem without DDP mode.

What's your environment?

  • OS: Ubuntu 18.04
  • Packaging: pip
  • Version: 0.7.3

@awaelchli
Copy link
Contributor

Could you set num workers to 0 to see if it is related to the dataloading? I had this problem before with regular pytorch and I think I solved it by setting the seed also in the dataloading, because each subprocess would have its own seed.

@belskikh
Copy link
Author

@awaelchli tried and failed

@awaelchli
Copy link
Contributor

Is there a chance you could share a colab with a minimal example? If not I will try to reproduce with the pl_exampels this weekend when i get to it.

@awaelchli awaelchli self-assigned this May 10, 2020
@haichao592
Copy link

In my case, it is caused by dropout.
I seed everything again in the spawed process before training fix the problem basically.
you can do this in on_train_start hook

@bnaman50
Copy link

bnaman50 commented Jan 4, 2021

@haichao592, could you please share your solution?

Thanks,
Naman

@haichao592
Copy link

@haichao592, could you please share your solution?

Thanks,
Naman

Just call pl.seed_everything(args.seed) in self.on_fit_start()

@bnaman50
Copy link

bnaman50 commented Jan 5, 2021

Hey @haichao592, thanks for your response.

I tried to reproduce this issue on MNIST dataset with a model having dropouts but I did not observe this issue. Maybe they have fixed this issue in the newer versions of PL. The only thing I used differently was deterministic=True in my trainer.

Here is the code in case you wanna see it (I just took it from the web for a quick check).

Could you please confirm it? I just want to make sure that I would not face such issues on my project related to the fine-tuning where debugging might not be trivial as is this one.

Thanks,
Naman

@sld
Copy link

sld commented Mar 16, 2021

For me helped setting generator argument in random_split fuction: train_set, val_set = random_split(data, (train_size, val_size), generator=torch.Generator().manual_seed(42))

@magehrig
Copy link

For anyone in the future,

I am at version 1.7.6 and it is not an issue anymore.
At this time, the documentation of the trainer shows how to achieve deterministic behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants