Setup CI as running on TPU #963

vfdev-5 · 2020-04-22T22:17:13Z

🚀 Feature

Ignite will support distributed training on TPU (e.g. #960). Currently, metric's computation is impacted in the same way as for DDP on GPUs. This should be addressed in a different Issue/PR.
Idea of this issue is to setup CI to emulate running on TPU as it is done in pytorch/xla.

Setup another workflow on our CirlceCI as it is done for xla:

Add a simple test marked as TPU (@pytest.mark.tpu)

The text was updated successfully, but these errors were encountered:

erip · 2020-04-25T19:33:39Z

I'd be interested in helping with this, but it seems like it'll require some administration on the CI side (setting env vars at least). Are TPU instances available freely for CI through CircleCI or is TPU virtualized through docker? Not sure how that works...

vfdev-5 · 2020-04-25T19:40:07Z

@erip thanks ! I think it is CPU emulation what is done on xla CircleCI. If you could take a look how they propose contributors to work on xla dev and run tests, so, we can understand how to setup our tests. In our case, we wont need to rebuild xla etc, we can just use their docker and setup CPU emulation stuff.

but it seems like it'll require some administration on the CI side (setting env vars at least)

Let me check if I can activate CircleCI specific workflows on PR. But anyway, it is just about sending another .circleci/another_config.yml file, I think. Let me check. Otherwise, we can opt to Github acitons

EDIT: seems like we can have a single .circleci/config.yml. So, let's create this XLA CI workflow with Github Actions.

erip · 2020-04-25T20:53:52Z

It seems like if we want to use the XLA docker images that pytorch/pytorch and pytorch/xla use in GitHub Actions, we'll need to develop an action that wraps the container. What's not immediately clear is how the tests actually get run from there. 😄 I'll need to do some reading, but just commenting here to document for myself later.

vfdev-5 · 2020-04-25T21:00:16Z

@erip I think it is more simple than that:

run tests inside the docker like that : https://github.com/pytorch/ignite/blob/master/.circleci/config.yml#L83
setup docker just by pulling their docker : docker pull gcr.io/tpu-pytorch/xla:r1.5

vfdev-5 added enhancement help wanted labels Apr 22, 2020

vfdev-5 mentioned this issue Apr 22, 2020

Metrics reduction on distributed TPU setting #965

Closed

erip mentioned this issue Apr 25, 2020

adds TPU to CI. #981

Merged

3 tasks

vfdev-5 closed this as completed in #981 Apr 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup CI as running on TPU #963

Setup CI as running on TPU #963

vfdev-5 commented Apr 22, 2020 •

edited

Loading

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020 •

edited

Loading

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020

Setup CI as running on TPU #963

Setup CI as running on TPU #963

Comments

vfdev-5 commented Apr 22, 2020 • edited Loading

🚀 Feature

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020 • edited Loading

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020

vfdev-5 commented Apr 22, 2020 •

edited

Loading

vfdev-5 commented Apr 25, 2020 •

edited

Loading