Acquiring a device before trying to spawn multiple processes causes an error #1587

mruberry · 2020-02-04T00:33:43Z

🐛 Bug

import torch_xla
import torch_xla.core.xla_model as xm
import torch_xla.distributed.xla_multiprocessing as xmp

device = xm.xla_device()

def _mp_fn(rank, flags):
  return 0

FLAGS={}
xmp.spawn(_mp_fn, args=(FLAGS,), nprocs=8,
          start_method='fork')

Causes error:

Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 116, in _start_fn
    _setup_replication()
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 109, in _setup_replication
    xm.set_replication(str(device), [str(device)])
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 200, in set_replication
    replication_devices = xla_replication_devices(devices)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 187, in xla_replication_devices
    .format(len(local_devices), len(kind_devices)))
RuntimeError: Cannot replicate if number of devices (1) is different from 8

If the device = xm.xla_device() line is commented out this error does not occur, but the process will hang (and eventually timeout). See #1586.

The text was updated successfully, but these errors were encountered:

dhananjayraut · 2020-03-02T06:22:25Z

Any updates ?
I am having the same issue.

dlibenzi · 2020-03-02T16:59:49Z

Just consider _mp_fn() to be your world, when working with Colab and multi-processing.

richarddwang · 2020-03-08T03:59:13Z

See #1576 , just don't call xm.xla_device() before spawn.

mobassir94 · 2020-05-05T15:40:54Z

can anyone please tell me where in this code : https://pastebin.com/eaR8PgQX
i am making mistakes? full code with error message is there,please check the code and help me,this is my first time with pytorch xla for image classification task,please help anyone?
@ezyang @mruberry @dhananjayraut @richardyy1188 @dlibenzi

mruberry closed this as completed Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acquiring a device before trying to spawn multiple processes causes an error #1587

Acquiring a device before trying to spawn multiple processes causes an error #1587

mruberry commented Feb 4, 2020

dhananjayraut commented Mar 2, 2020

dlibenzi commented Mar 2, 2020

richarddwang commented Mar 8, 2020

mobassir94 commented May 5, 2020

Acquiring a device before trying to spawn multiple processes causes an error #1587

Acquiring a device before trying to spawn multiple processes causes an error #1587

Comments

mruberry commented Feb 4, 2020

🐛 Bug

dhananjayraut commented Mar 2, 2020

dlibenzi commented Mar 2, 2020

richarddwang commented Mar 8, 2020

mobassir94 commented May 5, 2020