Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyclic learning rate finder as a part of Trainer #624

Closed
suvojit-0x55aa opened this issue Dec 13, 2019 · 29 comments · Fixed by #1347
Closed

Cyclic learning rate finder as a part of Trainer #624

suvojit-0x55aa opened this issue Dec 13, 2019 · 29 comments · Fixed by #1347
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Milestone

Comments

@suvojit-0x55aa
Copy link

🚀 Feature

Learning rate finder to plot lr vs loss relationship for Trainer and find a good starting learning rate.

Motivation

Cyclical Learning Rates for Training Neural Networks by Leslie N. Smith documents how to find a good learning rate for training with CyclicLR scheduler.

Pitch

Adding a methods to the Trainer class:

  • find() : Runs the CLR finder and plots the graph in Logger.
@suvojit-0x55aa suvojit-0x55aa added feature Is an improvement or enhancement help wanted Open to be worked on labels Dec 13, 2019
@williamFalcon
Copy link
Contributor

this should be a learning rate scheduler no?

@suvojit-0x55aa
Copy link
Author

suvojit-0x55aa commented Dec 13, 2019

This is not a scheduler per se, but it helps to find a good learning rate to start with that can be used with other optimizers as wells as helpful for finding [start_lr, end_lr] of CyclicLR.

Here is another article regarding this.

@FrancescoSaverioZuppichini

something like https://docs.fast.ai/callbacks.lr_finder.html right?

@suvojit-0x55aa
Copy link
Author

@FrancescoSaverioZuppichini yes. Since Trainer already has access to model and training data it will be a great feature for the Lightning community.

@FrancescoSaverioZuppichini

I totally agree. Maybe it can be easy copied directly from fastai.

@williamFalcon
Copy link
Contributor

let’s not copy anything from fast.ai. I’d rather be able to import from fast.ai and use it.

I’d like Lightning to work well with the other libraries.

so, the flow should be to allow support for this component from fast.ai and maybe generalize a bit to enable other components to work with lightning

@FrancescoSaverioZuppichini
Copy link

FrancescoSaverioZuppichini commented Dec 13, 2019

IMHO it doesn't make any sense to force the user to install fastai only to use a subfeature that can (and should) be present in lighting. Lighting should replace other libraries like fastai.

For example, I don't like fastai, the code base is not great and the doc is terrible, I would like to avoid installing it again on my machine to just use one feature.

@suvojit-0x55aa
Copy link
Author

suvojit-0x55aa commented Dec 13, 2019

@FrancescoSaverioZuppichini @williamFalcon lightning works great as the light-weight wrapper it is. it provides flexibility as well as extensibility. I suggested this feature cause it requires a few components to work like the optimizer, dataloader and the model; in Trainer we have all of those in the same place, and the technique is proven to work quite well in practice, so we can take inspiration form libraries like fast.ai, and the Pytorch implemetanion here as well as this keras implementaion here to implement it in lightning.

@williamFalcon
Copy link
Contributor

@tullie @neggert @jeffling @Borda
thoughts?

@Borda
Copy link
Member

Borda commented Dec 13, 2019

let’s not copy anything from fast.ai. I’d rather be able to import from fast.ai and use it.

totally agree, if they make some correction we would have it too and do not need to dig what is wrong again...

I’d like Lightning to work well with the other libraries.

agree, there is already use for torchvision

so, the flow should be to allow support for this component from fast.ai and maybe generalize a bit to enable other components to work with lightning

maybe something like we did with the logger, that there is an abstract class and then implement this LR

@neggert
Copy link
Contributor

neggert commented Dec 13, 2019

My only strong opinion is that we should not include fastai as a dependency, mostly because fastai has a ton of very heavy dependencies itself that would get pulled in.

@williamFalcon
Copy link
Contributor

it wouldn’t be a dependency. i mean the ability to work with other libraries. take the approach we took with mlflow as borda suggested

@jeffling
Copy link
Contributor

jeffling commented Dec 13, 2019

I actually did a bit of research into this and implemented this at work. It's actually very easy.

fastai's implementation just does a small run while tracking learning rate and loss, and then prints out the chart. They also have an option for finding the 'optimal' learning rate, but it's different for every use-case so even in the course they look at the graph and do it intuitively.

The easiest way to implement this with lightning that I can think of:

  1. is to use a learning rate scheduler that steps through the learning rate range you'd like to explore.
  2. Do a short run (1 epoch) using that learning rate scheduler. Make a model and Trainer and run fit().
  3. Use tensorboard or w&b or anything you want to graph loss vs learning rate (fast ai prints matplotlib graph). Or write some code to find the 'optimal' learning rate using the emitted logs.
  4. Choose your learning rate
  5. Plug in that number into a new Trainer/Model instance (remember to set the old one to .cpu()). If you used this technique you'll probably want to use another scheduler.
  6. Run Trainer.fit as you want.

Regarding using fast.ai, I don't think it would be possible to just use it, as the user would need to have a fast.ai model as well. Maybe there is an adapter we can provide in the future.

I suggested this feature cause it requires a few components to work like the optimizer, dataloader and the model

Regarding the optimizer, dataloader, model, I think we don't need any improvements there as you can do everything with Trainer already. BUT, we currently do not have the ability to .step() the learning rate scheduler every iteration, so that is the main blocker.

You can easily work around this by keeping a reference to the scheduler and stepping yourself, but lightning could also add this functionality.

TLDR: work needed for this:

  • Ability to step LR schedulers every iteration
  • Make sure LR is logged everytime it changes in logging (it might already be)

@suvojit-0x55aa
Copy link
Author

@williamFalcon @jeffling should we track the feature of stepping LR schedulers at every iteration in a separate issue ?

@FrancescoSaverioZuppichini

@jeffling sounds great, I think it should be easy to add it to the trainer

@williamFalcon
Copy link
Contributor

@FrancescoSaverioZuppichini @suvojit-0x55aa want to implement for lightning and submit a PR?
@Borda

Would be great to get this into the next release.

@williamFalcon williamFalcon added this to the 0.6.1 milestone Feb 11, 2020
@DrClick
Copy link

DrClick commented Feb 21, 2020

I have tried doing this in the training_step

current_lr = self.trainer.lr_schedulers[0].get_lr()[0]
current_batch_nb = self.trainer.total_batch_idx
self.logger.experiment.add_scalar("learning_rate", current_lr, current_batch_nb)

...

self.trainer.lr_schedulers[0].step()

which does step the scheduler, however, when trying to use the torch.optim.lr_scheduler.OneCycleLR where number of iterations spans epochs, the learning rate is reset at each epoch to the base learning rate. Does anyone know how I can stop this from happening? For oneCycle, the length of the cycle is normally the entire set of epochs you want to train so reseting at the start of each epoch breaks this.

@suvojit-0x55aa
Copy link
Author

@DrClick yes, actually this issue was pointed by @jeffling here. We also need to implement the feature to step scheduler every iteration.

@DrClick
Copy link

DrClick commented Feb 21, 2020

@suvojit-0x55aa thank you, I am referencing this

You can easily work around this by keeping a reference to the scheduler and stepping yourself, but lightning could also add this functionality.

I have done that with my code snippet, however, at each epoch, the learning rate is reset to the base learning rate for some reason. This results in the learning rate be truncated at every epoch. I suspect this has to do with the call to optimizer.step() and possibly I am changing the learning rate of scheduler out of context... Any thoughts? Additionally, at the end of each epoch, in the training_loop.py, it calls scheduler.step which adds num_gpus extra steps.

@Borda
Copy link
Member

Borda commented Feb 23, 2020

@FrancescoSaverioZuppichini @suvojit-0x55aa @DrClick any thought on the implementation?

@DrClick
Copy link

DrClick commented Feb 24, 2020

I am still looking into why this happens. I am happy to make a PR when I find a solution. I think this is a pretty critical issue, it flatly goes against the 1st stated design principle of "no pytorch interference"

@suvojit-0x55aa
Copy link
Author

@Borda @DrClick I looked into it, but still not able to pinpoint the issue, i'll update here if I find anything.

@williamFalcon williamFalcon modified the milestones: 0.7.0, 0.7.1 Mar 3, 2020
@DrClick
Copy link

DrClick commented Mar 5, 2020

I have found the issue and have a solution but I would like to discuss the possible solutions. I am happy to submit a PR. Basically, the training loop calls lr_scheduler.step(epoch=self.current_epoch) in the training loop. This has the affect of resetting an iterative learning rate scheduler. Possible solution: Expose a method on the trainer to step iterative schedulers and return tuple or dict of optimizers, epoch_schedulers, and iterative schedulers from model.configure_optimizers. @williamFalcon if this sounds good I will code this up.

This however leads to my second problem, I got this to work(by catching the call to my scheduler and ignoring calls where the epoch is provided) , but what I cannot find is a clean way train for a while, stop training, change the learning rate schedulers, and continue training. I have worked around it with the following, but this is clearly pretty unsatisfactory. I was wondering on your thoughts about this and if interested I can start an issue and work on it.

    print("Entering Phase 1. Freezing the base resnet")
    model.freeze_to(-1)
    trainer.fit(model)

    print("Entering Phase 2. Unfreezing the base resnet and lowering learning rates")

    # load the best model and unfreeze the resnet layer
    best_model_ckpt, best_epoch = get_best_model_ckpt(save_checkpoint_dir)

    print("\treload the best model from phase 1 of the training")
    model = BaseModel.load_from_checkpoint(best_model_ckpt)

    print("\tunfreezing the base layer")
    model.unfreeze()

    print("\tsetting a new learning rate for the head")
    # We are going to manually resume this from the next epoch and keep the same global step.
    # This skips a bunch of things we want to skip mainly resetting the global step and epoch
    # to that of the checkpoint
    trainer.resume_from_checkpoint = None

    # step trainer bookeeping manually
    trainer.current_epoch += 1
    trainer.global_step += 1
    trainer.max_epochs += hparams.phase_2_cycle_epochs

     # reset the hyperparmeters and recreate thes schedulers
    trainer.get_model().hparams.max_learning_rate_head = hparams.phase_2_max_learning_rate_head
    trainer.get_model().hparams.cycle_epochs = hparams.phase_2_cycle_epochs
    trainer.optimizers, trainer.lr_schedulers = trainer.init_optimizers(trainer.get_model().configure_optimizers())

    # reenable the progress bar for training
    pbar = tqdm.tqdm(leave=True, position=2 * trainer.process_position,
                    disable=not trainer.show_progress_bar, dynamic_ncols=True, unit='batch',
                    file=sys.stdout)
    trainer.main_progress_bar = pbar

    # clear cache before training
    if trainer.on_gpu:
        torch.cuda.empty_cache()

    # resume training without reinit
    trainer.train()

    print("Training completed.")

@FrancescoSaverioZuppichini
Copy link

FrancescoSaverioZuppichini commented Mar 7, 2020 via email

@schwobr
Copy link
Contributor

schwobr commented Mar 9, 2020

The way I implemented one-cycle is by completely overwriting lightning's base scheduler implementation. Basically I add a step_on_batch attribute to every scheduler, which is set to True for schedulers that need to be updated at every batch (like OneCycleLR). I then store the scheduler as an attribute of the model, and use hooks to update it, like:

    def on_batch_end(self):
        if self.sched is not None and self.sched.step_on_batch:
            self.sched.step()
    
    def on_epoch_end(self):
        if self.sched is not None and not self.sched.step_on_batch:
            self.sched.step()

Note that you can also use hooks to reset things between two phases of training.

@SkafteNicki SkafteNicki mentioned this issue Apr 2, 2020
5 tasks
@Borda Borda modified the milestones: 0.7.2, 0.7.3 Apr 8, 2020
@teichert
Copy link

Thanks for this feature! The Leslie Smith paper recommends using the LR sweep to choose lower and upper bounds (a.k.a. base_lr and max_lr respectively) for the Cyclic Learning Rate scheduler. Am I right that what is implemented here sweeps learning rates, allows users to inspect results, and suggests a reasonable learning rate, BUT that it isn't immediately useable for setting the parameters of the CyclicLR scheduler? Furthermore, if I use the CyclicLR scheduler (i.e. I return it along with my optimizer from configure_optimizers), won't the LR sweep also be using that CLR scheduler as well (which I don't think I want)?

(I'm new to pytorch-lightning, so I'm guessing that I'm just missing something obvious and that this is already set up to work easily. )

@kswamy15
Copy link

kswamy15 commented Aug 18, 2020

The way I implemented one-cycle is by completely overwriting lightning's base scheduler implementation. Basically I add a step_on_batch attribute to every scheduler, which is set to True for schedulers that need to be updated at every batch (like OneCycleLR). I then store the scheduler as an attribute of the model, and use hooks to update it, like:

    def on_batch_end(self):
        if self.sched is not None and self.sched.step_on_batch:
            self.sched.step()
    
    def on_epoch_end(self):
        if self.sched is not None and not self.sched.step_on_batch:
            self.sched.step()

Note that you can also use hooks to reset things between two phases of training.

Can you elaborate on this further.
I have my optimizer like this:
def configure_optimizers(self):

        # REQUIRED
        # can return multiple optimizers and learning_rate schedulers
        # (LBFGS it is automatically supported, no need for closure function)
        optimizer = torch.optim.Adam([p for p in self.parameters() if p.requires_grad], lr=self.hparams.learning_rate, eps=1e-08)
        scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=2e-4, total_steps=1000)
        return [optimizer], [scheduler]

so should I declare self.sched = scheduler here in this function? How do I add a 'step_on_batch' attribute to the scheduler here?
Thanks in advance for you help.

@kswamy15
Copy link

I used the code above for on_batch_end and on_epoch_end and was able to change the learning rate every batch. I used a print statement in the on_batch_end to verify that it did change. So this hack works.
for group in self.optim.param_groups: print('learning rate', group['lr'])

@tbenst
Copy link

tbenst commented Sep 14, 2020

@teichert did you come up with a solution / "best practice" for using the lr finder with one cycle? Appreciate any tips or pointers!

Edit: my current understanding is that the auto_lr_finder is not currently appropriate for cyclic learning rate as it currently fits in the middle of the lr range. Instead, we need to calculate the base and max learning rate.

In this photo: lr vs loss

We would want (approximately) lr_base=5e-5 and lr_max=10e-3, where the just start to see convergence and see the lowest loss, respectively.

@Borda Borda modified the milestones: 0.7.4, v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.