-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT2-large on Colab TPU seems to time out #996
Comments
Hi! thanks for your contribution!, great first issue! |
@bkkaggle try again using the latest version. |
I updated the colab notebook, the error remains but it looks like it's because pytorch/xla is loading the data to all the processes, causing an OOM. (pytorch/xla#1280 (comment)) Closing |
It's likely the kernel OOM killer triggering this. |
@srush fyi |
Yup, this is what I saw as well. You need enough RAM to have the model loaded 8 times. |
🐛 Bug
When training gpt2-large on a colab tpu, gpt2-large doesn't work
To Reproduce
See the colab notebook: https://colab.research.google.com/drive/1An6D3wh_H4dbmlEUHYOXZYxkH6S7VKu9
This is the relevant part of the stack trace:
Expected behavior
The code works when training on gpt2 (124M) but doesn't when training on gpt2-large (774M)
Environment
Additional context
The text was updated successfully, but these errors were encountered: