-
-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ignite with Megatron-style model-parallel PyTorch modules #1709
Comments
@g-karthik thanks for an interesting question! I haven't yet explored this hybrid data+model-parallel trainings and would love to test that. @sdesrozis any thoughts ? |
Hi @vfdev-5 , MONAI has a model-parallel tutorial: https://github.com/Project-MONAI/research-contributions/tree/master/lamp-automated-model-parallelism Thanks. |
I didn't yet experienced model parallel training. I would be very pleased to explore this topic. |
My first thoughts if we just consider model parallel on 2 GPUs
We first should test this before try hybrid data+model parallelism. @g-karthik could you explain how you think distribute your model and data in that case ? Thanks in advance. |
@sdesrozis take a look at https://www.deepspeed.ai/tutorials/pipeline/ and https://www.deepspeed.ai/tutorials/megatron/ and the example. I think in addition to what @sdesrozis said, |
@vfdev-5 That's exactly what I was thinking about the collective ops in metrics. |
@g-karthik @sdesrozis I'm working on how to make ignite distributed aware of particular data parallel configuration. I'll push soon a draft PR with new API and example using DeepSpeed. |
❓ Questions/Help/Support
This is a somewhat general question, but I'd love a detailed response. When wanting to go beyond standard data-parallel training towards hybrid data+model-parallel training (like Megatron-LM), what are some ignite abstractions to use and avoid?
@vfdev-5
The text was updated successfully, but these errors were encountered: