You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am getting a huge time difference between training a model on a specific tpu core tpu_cores=[1] and training a model on just 1 tpu core tpu_cores=1. What's the reason for that? Aren't both the conditions the same with just the difference that I am assigning a specific tpu_core in the first case and assigning the number of tpu_cores I want to use in the second case. Also in the second case, I am getting an error. When training with tpu_cores=[1] epoch time is 17 seconds with tpu_cores=1 epoch time is just 5 seconds.
Running on colab gives me an error but no error on Kaggle kernels. But the time difference issue is the same on both the platforms.
@dlibenzi I recall when training on a single core and using ParallelLoader, I used to receive an error. Hence the self.tpu_id is None condition. However, I did a recheck and it seems to be working fine with ParallelLoader now. Made a PR for the same. :)
Borda
changed the title
Training time on tpu is less when specifying the tpu_core
specifying the tpu_core speed-up TPU training
Jun 2, 2020
🐛 Bug
To Reproduce
Code sample
Colab Notebook
Expected behavior
As far as I know in both cases, the training time should be the same regardless of training on a single core or training on a specific core.
Environment
conda
,pip
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: