Step-wise processing, better support for `IterableDataset`, and others #640

ibeltagy · 2019-12-19T18:07:37Z

I have been using PTL for a month. It is nice and saves a lot of time, and I intend to use it in future projects. That said, I have a list of feature requests and improvements that would be very helpful to have to support a wider set of use cases. I am not sure what the best format for this list, so I will just write them here.

Better support for IterableDataset

In addition to val_check_interval, we also need num_val_steps and num_train_steps. num_val_steps is needed because the validation set is also using an IterableDataset. num_train_steps is needed because you usually need to carefully pick number of gradient updates which has some interaction with the learning rate scheduler (num_train_steps=inf is not sufficient)
For validation, keep the same DataLoader object instead of instantiating a new one on each validation cycle, because it is costly to construct new workers each time.
Some of the debugging features that run on a small percentage of the training/validation don't work because they are assuming a Dataset not IterableDataset

Step-wise processing
Thinking of the "gradient update" as the unit of training instead of (or in addition to) an epoch. A typical use case is pretraining a language model, where you want to control for number of gradient updates, not epochs (e.g. check the RoBERTa/BERT papers).

Add an option to do scheduler.step() after every gradient update
Have self.trainer.num_train_steps be available for the LR scheduler. The scheduler is usually a function of number of steps
Checkpointing the current step, and resume from that step. Again, this is important to get the right scheduler, and also important for the tensorboard logging. It will be nice to resume from the same training example, but this is less important.

Misc. These are smaller points, but nice to have.

Having the default tensorboard logging include LR, time per steps, allgradnorm (check fairseq)
Trainer(gpus=2) ignores CUDA_VISIBLE_DEVICES and always picks the first two gpus.
with ddp, sync validation stats across processes. This is a common mistake, and it will be nice to guard users against it. It is something like having the following line at the end of validation_end:

val_loss = torch.distributed.all_reduce(val_loss, op=torch.distributed.ReduceOp.SUM)/self.trainer.world_size

various logs refer to "batches", and it is not clear if it a "batch" or a "step". They are usually the same except with gradient accumulation. Personally, I prefer the word step because it eliminates that confusion.

Thanks for the helpful library and sorry for the long list.

The text was updated successfully, but these errors were encountered:

matthew-z · 2019-12-25T20:15:02Z

Trainer(gpus=2) ignores CUDA_VISIBLE_DEVICES and always picks the first two gpus.

Trainer cannot ignore CUDA_VISIBLE_DEVICES, as it is handled by the underlaying libraries.

The device ids in pytorch/tensorflow applications are not consistent with the ids showed in nvidia-smi . E.g, the gpu:0 in pytorch may be the gpu:2 in nvidia-smi. I guess that's why you failed to set the device id to use card you want.

You may set this env variable export CUDA_DEVICE_ORDER=PCI_BUS_ID, then the order of device ids will be consistent.

ibeltagy · 2020-01-21T00:46:40Z

@matthew-z , PTL does override my CUDA_DEVICE_ORDER to 0,1,.., making it difficult to run multiple jobs on the same machine. Check the code here:
https://github.com/PyTorchLightning/pytorch-lightning/blob/06242c200a318a37d1f882c786e60354ec04533f/pytorch_lightning/trainer/distrib_data_parallel.py#L251
and similarity here: https://github.com/PyTorchLightning/pytorch-lightning/blob/06242c200a318a37d1f882c786e60354ec04533f/pytorch_lightning/trainer/distrib_parts.py#L516

Borda · 2020-01-24T09:47:06Z

probably linked to #323 and #698

sshleifer · 2020-05-20T13:43:09Z

Is num_val_steps added, or some alias?

ibeltagy · 2020-05-20T13:54:39Z

As far as I can tell, no. My workaround is to add

    def __len__(self):
        return 1000000  # a large positive constant

in the IterableDataset, then use val_percent_check to run for a smaller number of steps.

ibeltagy added feature Is an improvement or enhancement help wanted Open to be worked on labels Dec 19, 2019

peteriz mentioned this issue Jan 22, 2020

Added max/min number of steps in Trainer #728

Merged

4 tasks

peteriz mentioned this issue Feb 9, 2020

Enable stepwise processing flag for schedulers #806

Closed

williamFalcon closed this as completed in #728 Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step-wise processing, better support for `IterableDataset`, and others #640

Step-wise processing, better support for `IterableDataset`, and others #640

ibeltagy commented Dec 19, 2019 •

edited

Loading

matthew-z commented Dec 25, 2019

ibeltagy commented Jan 21, 2020 •

edited

Loading

Borda commented Jan 24, 2020

sshleifer commented May 20, 2020

ibeltagy commented May 20, 2020

Step-wise processing, better support for IterableDataset, and others #640

Step-wise processing, better support for IterableDataset, and others #640

Comments

ibeltagy commented Dec 19, 2019 • edited Loading

matthew-z commented Dec 25, 2019

ibeltagy commented Jan 21, 2020 • edited Loading

Borda commented Jan 24, 2020

sshleifer commented May 20, 2020

ibeltagy commented May 20, 2020

Step-wise processing, better support for `IterableDataset`, and others #640

Step-wise processing, better support for `IterableDataset`, and others #640

ibeltagy commented Dec 19, 2019 •

edited

Loading

ibeltagy commented Jan 21, 2020 •

edited

Loading