-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step-wise processing, better support for IterableDataset
, and others
#640
Comments
Trainer cannot ignore CUDA_VISIBLE_DEVICES, as it is handled by the underlaying libraries. The device ids in pytorch/tensorflow applications are not consistent with the ids showed in You may set this env variable |
@matthew-z , PTL does override my |
Is |
As far as I can tell, no. My workaround is to add
in the |
I have been using PTL for a month. It is nice and saves a lot of time, and I intend to use it in future projects. That said, I have a list of feature requests and improvements that would be very helpful to have to support a wider set of use cases. I am not sure what the best format for this list, so I will just write them here.
IterableDataset
val_check_interval
, we also neednum_val_steps
andnum_train_steps
.num_val_steps
is needed because the validation set is also using an IterableDataset.num_train_steps
is needed because you usually need to carefully pick number of gradient updates which has some interaction with the learning rate scheduler (num_train_steps=inf
is not sufficient)DataLoader
object instead of instantiating a new one on each validation cycle, because it is costly to construct new workers each time.Dataset
notIterableDataset
Thinking of the "gradient update" as the unit of training instead of (or in addition to) an epoch. A typical use case is pretraining a language model, where you want to control for number of gradient updates, not epochs (e.g. check the RoBERTa/BERT papers).
scheduler.step()
after every gradient updateself.trainer.num_train_steps
be available for the LR scheduler. The scheduler is usually a function of number of stepsallgradnorm
(check fairseq)Trainer(gpus=2)
ignoresCUDA_VISIBLE_DEVICES
and always picks the first two gpus.validation_end
:Thanks for the helpful library and sorry for the long list.
The text was updated successfully, but these errors were encountered: