Multiprocessing RAM usage #1742

bilal2vec · 2020-03-11T02:47:20Z

❓ Questions and Help

Hi,

As noted here and here, it looks like pytorch/xla's multiprocessing causes RAM usage to scale with the number of cores being used. Increasing the RAM is ok when n_cores=8, but if you're running on TPU Pod slice with a lot more cores, just increasing the RAM won't work.

Whats the recommended way to scale large models that take up ~10GB of RAM/core to a TPU pod with 32 or 64 cores? Would multithreading be the solution? Is there a performance difference between using start_method=fork/spawn

Thanks,

Bilal

dlibenzi · 2020-03-11T03:24:44Z

At the moment PyTorch TPU on POD requires a matching number of user VMs, per TPU VMs.
So each user VM will drive 8 TPU v3 cores in any case.
We are migrating the architecture so that the user and TPU VMs will be consolidated, and have a considerable higher number of cores and RAM.

bilal2vec · 2020-03-12T13:29:10Z

Thanks, closing

bilal2vec closed this as completed Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing RAM usage #1742

Multiprocessing RAM usage #1742

bilal2vec commented Mar 11, 2020

dlibenzi commented Mar 11, 2020

bilal2vec commented Mar 12, 2020

Multiprocessing RAM usage #1742

Multiprocessing RAM usage #1742

Comments

bilal2vec commented Mar 11, 2020

❓ Questions and Help

dlibenzi commented Mar 11, 2020

bilal2vec commented Mar 12, 2020