Add IterableDataset support #323

antvconst · 2019-10-07T13:14:08Z

Looks like currently there is no way to use an IterableDataset instance for training. Trying to do so results in a crash with this exception:

Traceback (most recent call last):
  File "main.py", line 12, in <module>
    trainer.fit(model)
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 677, in fit
    self.__single_gpu_train(model)
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 725, in __single_gpu_train
    self.__run_pretrain_routine(model)
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 882, in __run_pretrain_routine
    self.__layout_bookeeping()
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 436, in __layout_bookeeping
    self.nb_training_batches = len(self.train_dataloader)
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 297, in __len__
    return len(self._index_sampler)  # with iterable-style dataset, this will error
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 212, in __len__
    return (len(self.sampler) + self.batch_size - 1) // self.batch_size
  File "/home/akonstantinov/.miniconda3/envs/cpn/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 57, in __len__
    raise TypeError('Cannot determine the DataLoader length of a IterableDataset')
TypeError: Cannot determine the DataLoader length of a IterableDataset

The text was updated successfully, but these errors were encountered:

williamFalcon · 2019-10-10T23:35:56Z

Got it.

Since it's impossible to know when to check validation, checkpoint or stop training the workaround is to add a really high number to the len of your dataloader.

williamFalcon · 2019-10-10T23:38:18Z

To support this we should modify the training loop so it does validation check, etc... every k batches. Might need to disable tqdm limit as well because we won't know the length.

The use case is for streaming data or database-type reads.

antvconst · 2019-10-11T09:46:49Z

There is also a bit more simple case, when the length is actually known, but random access by index is not available. That is true in my case: my dataset generates samples on the fly, but always a fixed amount per epoch. Say, every epoch 10k samples are generated and fed into the model by batches on 100 samples.

williamFalcon · 2019-10-11T10:52:33Z

so, actually all of this can be solved by adding a way to say how many batches you want per epoch. then everything just works out.

Trainer(max_epoch_batches=10000)

williamFalcon · 2019-10-11T10:52:48Z

@neggert any thoughts?

neggert · 2019-10-11T14:49:52Z

In the past, I've handled this like by storing a num_batches attribute in custom batch sampler (which I needed to use for other reasons). Then we just do this:

    def __len__(self):
        return self.num_batches

    def _get_batch(self):
        ...

    def __iter__(self):
        return iter((self._get_batch() for _ in range(self.num_batches))

This is probably not a good general solution, as it would have been a lot of work if I hadn't been planning on using a custom batch sampler anyway.

For a general solution, think a max_epoch_batches arg is a good idea.

We do need to be a little bit careful, as there are some pitfalls around using an IterableDataset with multiple workers or nodes. It would be good to warn users about these like we do with DistributedSampler.

williamFalcon · 2019-10-21T10:16:25Z

So, the resolution here is to add an argument:

Trainer(max_epoch_batches=10000) which calls validation set at that interval (and overrides all the other settings for this).

(open to a better name for this)

@Borda @neggert any of u guys want to take a stab at this?

williamFalcon · 2019-10-22T02:13:19Z

Added in #405

To use:

Trainer(val_check_interval=100)

(checks val every 100 train batches interval)

@falceeffect please verify it works as requested. Otherwise we can reopen

armancohan · 2019-11-28T00:11:25Z

Doesn't solve the problem

calclavia · 2019-12-01T01:01:32Z

Same here. Still give an error

williamFalcon · 2019-12-01T01:03:17Z

@calclavia which problem? compatibility? (@MikeScarp)

armancohan · 2019-12-01T19:35:27Z

@williamFalcon The original issue reported here is not fixed by #405. I am still unable to train with an instance of ItrableDataset.

calclavia · 2019-12-02T18:32:12Z

Right. Putting in val_check_interval works but it doesn't seem to circumvent Pytorch lightning asking for len of the dataset, which leads to a crash since IterableDataset doesn't support len call

williamFalcon · 2019-12-02T18:35:02Z

@calclavia mind submitting a PR? where is the len being asked? i thought we specifically handled that case

absudabsu · 2019-12-12T22:19:53Z

I tried defining the __len__(self) in my dataset class (which inherits from torch.utils.data.IterableDataset)), but it still didn't work.
For me, the actual error occurs on the line 297 of torch/utils/data/dataloader.py, when it tries to call len(self._index_sampler).
It would be cool if pytorch-lightning supported a IterableDatasets by calling the right torch functions.

felixlaumon · 2020-01-02T23:42:56Z

It appears there is a typo in the latest pip installable version 0.5.3.2 https://github.com/williamFalcon/pytorch-lightning/blob/0.5.3.2/pytorch_lightning/trainer/data_loading_mixin.py#L27.

It should be isinstance(self.get_train_dataloader().dataset, IterableDataset) instead of isinstance(self.get_train_dataloader(), IterableDataset)

This is later fixed in #549 but it has not been released.

@williamFalcon will you be able to make a new release soon since there was no release in December? Thanks!

matthew-z · 2020-01-17T02:31:57Z

Actually, even the latest master branch still has this problem.

Traceback (most recent call last):
  File "scripts/msmacro.py", line 119, in <module>
    main()
  File "scripts/msmacro.py", line 115, in main
    trainer.fit(model)
  File "/home/zhaohao/.anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 417, in fit
    self.run_pretrain_routine(model)
  File "/home/zhaohao/.anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 481, in run_pretrain_routine
    self.get_dataloaders(ref_model)
  File "/home/zhaohao/.anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 199, in get_dataloaders
    self.init_train_dataloader(model)
  File "/home/zhaohao/.anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 78, in init_train_dataloader
    self.val_check_batch = int(self.num_training_batches * self.val_check_interval)
OverflowError: cannot convert float infinity to integer

Borda · 2020-01-17T04:29:13Z

@matthew-z Could you please reopen this issue or make another one?

matthew-z · 2020-01-17T04:55:09Z

I don't have the privilege to re-open a closed issue, so I will open a new one

antvconst added the bug Something isn't working label Oct 7, 2019

williamFalcon added feature Is an improvement or enhancement help wanted Open to be worked on and removed bug Something isn't working labels Oct 10, 2019

williamFalcon changed the title ~~IterableDataset is not supported~~ Add IterableDataset support Oct 10, 2019

williamFalcon closed this as completed Oct 22, 2019

volcacius mentioned this issue Nov 11, 2019

IterableDataset breaks 1.1 compatibility #491

Closed

MikeScarp mentioned this issue Nov 25, 2019

fixing bug in testing for IterableDataset #547

Merged

4 tasks

Borda mentioned this issue Jan 24, 2020

Step-wise processing, better support for IterableDataset, and others #640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IterableDataset support #323

Add IterableDataset support #323

antvconst commented Oct 7, 2019

williamFalcon commented Oct 10, 2019

williamFalcon commented Oct 10, 2019

antvconst commented Oct 11, 2019

williamFalcon commented Oct 11, 2019

williamFalcon commented Oct 11, 2019

neggert commented Oct 11, 2019

williamFalcon commented Oct 21, 2019 •

edited

Loading

williamFalcon commented Oct 22, 2019 •

edited

Loading

armancohan commented Nov 28, 2019

calclavia commented Dec 1, 2019

williamFalcon commented Dec 1, 2019

armancohan commented Dec 1, 2019

calclavia commented Dec 2, 2019

williamFalcon commented Dec 2, 2019

absudabsu commented Dec 12, 2019 •

edited

Loading

felixlaumon commented Jan 2, 2020

matthew-z commented Jan 17, 2020

Borda commented Jan 17, 2020

matthew-z commented Jan 17, 2020

Add IterableDataset support #323

Add IterableDataset support #323

Comments

antvconst commented Oct 7, 2019

williamFalcon commented Oct 10, 2019

williamFalcon commented Oct 10, 2019

antvconst commented Oct 11, 2019

williamFalcon commented Oct 11, 2019

williamFalcon commented Oct 11, 2019

neggert commented Oct 11, 2019

williamFalcon commented Oct 21, 2019 • edited Loading

williamFalcon commented Oct 22, 2019 • edited Loading

armancohan commented Nov 28, 2019

calclavia commented Dec 1, 2019

williamFalcon commented Dec 1, 2019

armancohan commented Dec 1, 2019

calclavia commented Dec 2, 2019

williamFalcon commented Dec 2, 2019

absudabsu commented Dec 12, 2019 • edited Loading

felixlaumon commented Jan 2, 2020

matthew-z commented Jan 17, 2020

Borda commented Jan 17, 2020

matthew-z commented Jan 17, 2020

williamFalcon commented Oct 21, 2019 •

edited

Loading

williamFalcon commented Oct 22, 2019 •

edited

Loading

absudabsu commented Dec 12, 2019 •

edited

Loading