-
-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeed support for ignite.distributed #2008
Comments
@Kashu7100 Thank you for this suggestion! I confirm that it would be very nice to support DeepSpeed with Currently we have docker environment configured with MS DeepSpeed. https://github.com/pytorch/ignite/tree/master/docker/msdp Would you like to contribute on this ? It seems you already know how to do it 😉 |
@sdesrozis Do you think it is possible to reuse
|
It depends on what you want to do. The features list of msdp is quite long and there are more or less deep impacts. For instance, I think that the pipeline parallelism would be a very nice feature to have but not trivial to adapt. Maybe a first step could be the distributed parallelism using the simplified api as you mentioned. Thus, it may be a new backend to develop and integrate in our You can have a look here. Btw, it's not an easy task and maybe I'm wrong about what to do. @vfdev-5 was looking further on this, maybe he could help in the discussion. |
@Kashu7100 Finally, introducing a new backend does not seem to be the good option. Have a look here, and you would see that native PyTorch distributed is used when distributed environment variables are set. That is a good news for simple use cases.
I would say yes. |
@Kashu7100 thanks for the feature request! Yes, we plan to improve our support of deepspeed framework which is roughly:
Our idea was to provide basic integration examples of how to use ignite and deepspeed together. I looked at it multiple times and due to certain overlap between the framework it was not obvious where to put the split. @sdesrozis I'm not sure whether we should add it as a new backend or not. Let's first create basic integration example and see which part of DeepSpeed code could be simplified using |
I think this could be integrated in our native backend, beside slurm.
IMO it is not necessary.
That is a good option. As discussed a few weeks ago, the specific engine should be the tricky part. Otherwise, auto helpers could do the job. I suppose. |
Hi, is there any update on this? |
@saifullah3396 well this feature is not really a priority right now. If you would like to help with, we can guide your development from ignite side. |
🚀 Feature
Pytorch lightning recently added native support for MS DeepSpeed.
I believe it is also helpful for users if ignite incorporates the DeepSpeed pipeline for memory-efficient distributed training.
1. for idist.auto_model ..?
To initialize the DeepSpeed engine:
And for distributed environment setup, we need to replace
torch.distributed.init_process_group(...)
todeepspeed.init_distributed()
2. checkpoint handler
slightly different thing for checkpointing
The text was updated successfully, but these errors were encountered: