Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: zip argument #1 must support iteration" when training map_model from scratch #18

Closed
AIasd opened this issue Aug 5, 2020 · 9 comments

Comments

@AIasd
Copy link

AIasd commented Aug 5, 2020

Hi,
I downloaded data and tried to train the map_model from scratch by running:
python3 -m carla_project/src/map_model --dataset_dir /path/to/data.
But I encountered the following error:

191 | controller.layers.2                        | ReLU                    | 0     
192 | controller.layers.3                        | BatchNorm1d             | 64    
193 | controller.layers.4                        | Linear                  | 1 K   
194 | controller.layers.5                        | ReLU                    | 0     
195 | controller.layers.6                        | BatchNorm1d             | 64    
196 | controller.layers.7                        | Linear                  | 66    
../LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
../LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
../LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
../LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
../LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
../LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
../LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
6593 frames.
[ 537  484 2226  752  527 1156  911]
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):

  File "carla_project/src/map_model.py", line 236, in <module>
    main(parsed)
  File "carla_project/src/map_model.py", line 207, in main
    trainer.fit(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 759, in fit
    self.dp_train(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 563, in dp_train
    self.run_pretrain_routine(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 899, in run_pretrain_routine
    False)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 278, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 421, in evaluation_forward
    output = model(*args)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 66, in forward
    return self.gather(outputs, self.output_device)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
    for k in out))
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.pwandb: Waiting for W&B process to finish, PID 15597
y", line 62, in <genexpr>
    for k in out))
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Process crashed early, not syncing files

Any help will be appreciated!

@bradyz
Copy link
Owner

bradyz commented Aug 7, 2020

are you using the requirements.txt provided in the carla_project directory?

i havent seen this one before, so it might be some difference in pytorch lightning versions

@AIasd
Copy link
Author

AIasd commented Aug 7, 2020

Hi @bradyz Thanks for quick response! I searched around and find that this is related to the issue of pytorch lightning here. After I pull their master branch, the error got solved. However, a new error appears:

  | Name       | Type              | Params
-------------------------------------------------
0 | to_heatmap | ToHeatmap         | 0     
1 | net        | SegmentationModel | 39 M  
2 | controller | RawController     | 1 K   
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
6593 frames.
[ 537  484 2226  752  527 1156  911]
6593 frames.
[ 537  484 2226  752  527 1156  911]
Validation sanity check: 0it [00:00, ?it/s]
wandb: Waiting for W&B process to finish, PID 929
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
Traceback (most recent call last):
  File "carla_project/src/map_model.py", line 236, in <module>
    main(parsed)
  File "carla_project/src/map_model.py", line 207, in main
    trainer.fit(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in fit
    self.accelerator_backend.train(model, nprocs=self.num_processes)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/accelerator_backends/ddp_spawn_backend.py", line 43, in train
    mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/accelerator_backends/ddp_spawn_backend.py", line 157, in ddp_train
    results = self.trainer.run_pretrain_routine(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1194, in run_pretrain_routine
    self._run_sanity_check(ref_model, model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1227, in _run_sanity_check
    eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 305, in _evaluate
    for batch_idx, batch in enumerate(dataloader):
  File "/home/zhongzzy9/Documents/self-driving-car/2020_CARLA_challenge/carla_project/src/dataset_wrapper.py", line 29, in __iter__
    yield next(self.data)
  File "/home/zhongzzy9/Documents/self-driving-car/2020_CARLA_challenge/carla_project/src/dataset_wrapper.py", line 8, in _repeater
    for data in loader:
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataset.<locals>.<lambda>' 

These problems are all related to the usage of multi-GPUs. I guess the code is not meant to support pytorch multi-GPUs training, right? When I try only using one GPU, the code can run successfully.

@bradyz
Copy link
Owner

bradyz commented Aug 7, 2020

correct - i only used this code for single gpu training

@AIasd
Copy link
Author

AIasd commented Aug 8, 2020

Thank you for clarifying!

@AIasd AIasd closed this as completed Aug 8, 2020
@aleallievi
Copy link

@AIasd did you ever find a solution for this? My solution was to set the Trainer(accelerator='horovod) or Trainer(accelerator='ddp_spawn'); however it would be great if we could use ddp. Thanks!

@AIasd
Copy link
Author

AIasd commented Nov 12, 2020

@aleallievi , in my case, not using ddp is fine so I did not explore this issue further.

@aleallievi
Copy link

@aleallievi , in my case, not using ddp is fine so I did not explore this issue further.

Ok - thanks for your feedback

@raozhongyu
Copy link

Thanks for your work , I meet the same question about"TypeError: zip argument #1 must support iteration". Could you tall me how to solve it. Thanks a lot

@pratikchhapolika
Copy link

@AIasd how did you solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants