Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing RAM usage #1742

Closed
bilal2vec opened this issue Mar 11, 2020 · 2 comments
Closed

Multiprocessing RAM usage #1742

bilal2vec opened this issue Mar 11, 2020 · 2 comments

Comments

@bilal2vec
Copy link

❓ Questions and Help

Hi,

As noted here and here, it looks like pytorch/xla's multiprocessing causes RAM usage to scale with the number of cores being used. Increasing the RAM is ok when n_cores=8, but if you're running on TPU Pod slice with a lot more cores, just increasing the RAM won't work.

Whats the recommended way to scale large models that take up ~10GB of RAM/core to a TPU pod with 32 or 64 cores? Would multithreading be the solution? Is there a performance difference between using start_method=fork/spawn

Thanks,

Bilal

@dlibenzi
Copy link
Collaborator

At the moment PyTorch TPU on POD requires a matching number of user VMs, per TPU VMs.
So each user VM will drive 8 TPU v3 cores in any case.
We are migrating the architecture so that the user and TPU VMs will be consolidated, and have a considerable higher number of cores and RAM.

@bilal2vec
Copy link
Author

Thanks, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants