-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colab TPU : process terminated with signal SIGKILL #1590
Comments
That message is almost certainly the Linux kernel OOM killer getting rid of the fattiest process. Unfortunately those are not what our usual way of doing things is, and what Lightning adopts ATM. |
I could solve my specific problem by specifying |
What is different colab vs kaggle? I had this problem only with colab... |
Did not worked for me |
🐛 Bug
I'm trying to train BART (with
transformers
library) on Colab TPU. I followed the TPU documentation of Pytorch Lightning, but before the training can start, I receive the following error :To Reproduce
I'm using the official example for text summarization on
transformers
library : https://github.com/huggingface/transformers/blob/master/examples/summarization/bart/finetune.pyHere is the full stack trace :
Environment
Additional context
I saw the issue #996 but I don't think it's the issue because my RAM does not appear to be full :
The text was updated successfully, but these errors were encountered: