Cyclic learning rate finder as a part of Trainer #624

suvojit-0x55aa · 2019-12-13T04:56:29Z

🚀 Feature

Learning rate finder to plot lr vs loss relationship for Trainer and find a good starting learning rate.

Motivation

Cyclical Learning Rates for Training Neural Networks by Leslie N. Smith documents how to find a good learning rate for training with CyclicLR scheduler.

Pitch

Adding a methods to the Trainer class:

find() : Runs the CLR finder and plots the graph in Logger.

The text was updated successfully, but these errors were encountered:

williamFalcon · 2019-12-13T05:57:53Z

this should be a learning rate scheduler no?

suvojit-0x55aa · 2019-12-13T06:02:03Z

This is not a scheduler per se, but it helps to find a good learning rate to start with that can be used with other optimizers as wells as helpful for finding [start_lr, end_lr] of CyclicLR.

Here is another article regarding this.

FrancescoSaverioZuppichini · 2019-12-13T09:36:49Z

something like https://docs.fast.ai/callbacks.lr_finder.html right?

suvojit-0x55aa · 2019-12-13T10:35:52Z

@FrancescoSaverioZuppichini yes. Since Trainer already has access to model and training data it will be a great feature for the Lightning community.

FrancescoSaverioZuppichini · 2019-12-13T11:50:14Z

I totally agree. Maybe it can be easy copied directly from fastai.

williamFalcon · 2019-12-13T13:26:28Z

let’s not copy anything from fast.ai. I’d rather be able to import from fast.ai and use it.

I’d like Lightning to work well with the other libraries.

so, the flow should be to allow support for this component from fast.ai and maybe generalize a bit to enable other components to work with lightning

FrancescoSaverioZuppichini · 2019-12-13T13:33:57Z

IMHO it doesn't make any sense to force the user to install fastai only to use a subfeature that can (and should) be present in lighting. Lighting should replace other libraries like fastai.

For example, I don't like fastai, the code base is not great and the doc is terrible, I would like to avoid installing it again on my machine to just use one feature.

suvojit-0x55aa · 2019-12-13T13:39:59Z

@FrancescoSaverioZuppichini @williamFalcon lightning works great as the light-weight wrapper it is. it provides flexibility as well as extensibility. I suggested this feature cause it requires a few components to work like the optimizer, dataloader and the model; in Trainer we have all of those in the same place, and the technique is proven to work quite well in practice, so we can take inspiration form libraries like fast.ai, and the Pytorch implemetanion here as well as this keras implementaion here to implement it in lightning.

williamFalcon · 2019-12-13T13:41:33Z

@tullie @neggert @jeffling @Borda
thoughts?

Borda · 2019-12-13T16:04:51Z

let’s not copy anything from fast.ai. I’d rather be able to import from fast.ai and use it.

totally agree, if they make some correction we would have it too and do not need to dig what is wrong again...

I’d like Lightning to work well with the other libraries.

agree, there is already use for torchvision

so, the flow should be to allow support for this component from fast.ai and maybe generalize a bit to enable other components to work with lightning

maybe something like we did with the logger, that there is an abstract class and then implement this LR

neggert · 2019-12-13T17:15:21Z

My only strong opinion is that we should not include fastai as a dependency, mostly because fastai has a ton of very heavy dependencies itself that would get pulled in.

williamFalcon · 2019-12-13T17:22:24Z

it wouldn’t be a dependency. i mean the ability to work with other libraries. take the approach we took with mlflow as borda suggested

jeffling · 2019-12-13T17:41:49Z

I actually did a bit of research into this and implemented this at work. It's actually very easy.

fastai's implementation just does a small run while tracking learning rate and loss, and then prints out the chart. They also have an option for finding the 'optimal' learning rate, but it's different for every use-case so even in the course they look at the graph and do it intuitively.

The easiest way to implement this with lightning that I can think of:

is to use a learning rate scheduler that steps through the learning rate range you'd like to explore.
Do a short run (1 epoch) using that learning rate scheduler. Make a model and Trainer and run fit().
Use tensorboard or w&b or anything you want to graph loss vs learning rate (fast ai prints matplotlib graph). Or write some code to find the 'optimal' learning rate using the emitted logs.
Choose your learning rate
Plug in that number into a new Trainer/Model instance (remember to set the old one to .cpu()). If you used this technique you'll probably want to use another scheduler.
Run Trainer.fit as you want.

Regarding using fast.ai, I don't think it would be possible to just use it, as the user would need to have a fast.ai model as well. Maybe there is an adapter we can provide in the future.

I suggested this feature cause it requires a few components to work like the optimizer, dataloader and the model

Regarding the optimizer, dataloader, model, I think we don't need any improvements there as you can do everything with Trainer already. BUT, we currently do not have the ability to .step() the learning rate scheduler every iteration, so that is the main blocker.

You can easily work around this by keeping a reference to the scheduler and stepping yourself, but lightning could also add this functionality.

TLDR: work needed for this:

Ability to step LR schedulers every iteration
Make sure LR is logged everytime it changes in logging (it might already be)

suvojit-0x55aa · 2019-12-16T09:02:57Z

@williamFalcon @jeffling should we track the feature of stepping LR schedulers at every iteration in a separate issue ?

FrancescoSaverioZuppichini · 2019-12-16T13:59:25Z

@jeffling sounds great, I think it should be easy to add it to the trainer

williamFalcon · 2020-02-11T17:40:57Z

@FrancescoSaverioZuppichini @suvojit-0x55aa want to implement for lightning and submit a PR?
@Borda

Would be great to get this into the next release.

DrClick · 2020-02-21T04:22:42Z

I have tried doing this in the training_step

current_lr = self.trainer.lr_schedulers[0].get_lr()[0]
current_batch_nb = self.trainer.total_batch_idx
self.logger.experiment.add_scalar("learning_rate", current_lr, current_batch_nb)

...

self.trainer.lr_schedulers[0].step()

which does step the scheduler, however, when trying to use the torch.optim.lr_scheduler.OneCycleLR where number of iterations spans epochs, the learning rate is reset at each epoch to the base learning rate. Does anyone know how I can stop this from happening? For oneCycle, the length of the cycle is normally the entire set of epochs you want to train so reseting at the start of each epoch breaks this.

suvojit-0x55aa · 2020-02-21T06:26:24Z

@DrClick yes, actually this issue was pointed by @jeffling here. We also need to implement the feature to step scheduler every iteration.

DrClick · 2020-02-21T13:32:21Z

@suvojit-0x55aa thank you, I am referencing this

You can easily work around this by keeping a reference to the scheduler and stepping yourself, but lightning could also add this functionality.

I have done that with my code snippet, however, at each epoch, the learning rate is reset to the base learning rate for some reason. This results in the learning rate be truncated at every epoch. I suspect this has to do with the call to optimizer.step() and possibly I am changing the learning rate of scheduler out of context... Any thoughts? Additionally, at the end of each epoch, in the training_loop.py, it calls scheduler.step which adds num_gpus extra steps.

Borda · 2020-02-23T11:03:00Z

@FrancescoSaverioZuppichini @suvojit-0x55aa @DrClick any thought on the implementation?

DrClick · 2020-02-24T14:04:50Z

I am still looking into why this happens. I am happy to make a PR when I find a solution. I think this is a pretty critical issue, it flatly goes against the 1st stated design principle of "no pytorch interference"

suvojit-0x55aa · 2020-02-25T06:40:50Z

@Borda @DrClick I looked into it, but still not able to pinpoint the issue, i'll update here if I find anything.

DrClick · 2020-03-05T12:30:49Z

I have found the issue and have a solution but I would like to discuss the possible solutions. I am happy to submit a PR. Basically, the training loop calls lr_scheduler.step(epoch=self.current_epoch) in the training loop. This has the affect of resetting an iterative learning rate scheduler. Possible solution: Expose a method on the trainer to step iterative schedulers and return tuple or dict of optimizers, epoch_schedulers, and iterative schedulers from model.configure_optimizers. @williamFalcon if this sounds good I will code this up.

This however leads to my second problem, I got this to work(by catching the call to my scheduler and ignoring calls where the epoch is provided) , but what I cannot find is a clean way train for a while, stop training, change the learning rate schedulers, and continue training. I have worked around it with the following, but this is clearly pretty unsatisfactory. I was wondering on your thoughts about this and if interested I can start an issue and work on it.

    print("Entering Phase 1. Freezing the base resnet")
    model.freeze_to(-1)
    trainer.fit(model)

    print("Entering Phase 2. Unfreezing the base resnet and lowering learning rates")

    # load the best model and unfreeze the resnet layer
    best_model_ckpt, best_epoch = get_best_model_ckpt(save_checkpoint_dir)

    print("\treload the best model from phase 1 of the training")
    model = BaseModel.load_from_checkpoint(best_model_ckpt)

    print("\tunfreezing the base layer")
    model.unfreeze()

    print("\tsetting a new learning rate for the head")
    # We are going to manually resume this from the next epoch and keep the same global step.
    # This skips a bunch of things we want to skip mainly resetting the global step and epoch
    # to that of the checkpoint
    trainer.resume_from_checkpoint = None

    # step trainer bookeeping manually
    trainer.current_epoch += 1
    trainer.global_step += 1
    trainer.max_epochs += hparams.phase_2_cycle_epochs

     # reset the hyperparmeters and recreate thes schedulers
    trainer.get_model().hparams.max_learning_rate_head = hparams.phase_2_max_learning_rate_head
    trainer.get_model().hparams.cycle_epochs = hparams.phase_2_cycle_epochs
    trainer.optimizers, trainer.lr_schedulers = trainer.init_optimizers(trainer.get_model().configure_optimizers())

    # reenable the progress bar for training
    pbar = tqdm.tqdm(leave=True, position=2 * trainer.process_position,
                    disable=not trainer.show_progress_bar, dynamic_ncols=True, unit='batch',
                    file=sys.stdout)
    trainer.main_progress_bar = pbar

    # clear cache before training
    if trainer.on_gpu:
        torch.cuda.empty_cache()

    # resume training without reinit
    trainer.train()

    print("Training completed.")

FrancescoSaverioZuppichini · 2020-03-07T15:22:00Z

And this library should make things easier 😂

schwobr · 2020-03-09T09:50:38Z

The way I implemented one-cycle is by completely overwriting lightning's base scheduler implementation. Basically I add a step_on_batch attribute to every scheduler, which is set to True for schedulers that need to be updated at every batch (like OneCycleLR). I then store the scheduler as an attribute of the model, and use hooks to update it, like:

    def on_batch_end(self):
        if self.sched is not None and self.sched.step_on_batch:
            self.sched.step()
    
    def on_epoch_end(self):
        if self.sched is not None and not self.sched.step_on_batch:
            self.sched.step()

Note that you can also use hooks to reset things between two phases of training.

teichert · 2020-05-21T23:15:41Z

Thanks for this feature! The Leslie Smith paper recommends using the LR sweep to choose lower and upper bounds (a.k.a. base_lr and max_lr respectively) for the Cyclic Learning Rate scheduler. Am I right that what is implemented here sweeps learning rates, allows users to inspect results, and suggests a reasonable learning rate, BUT that it isn't immediately useable for setting the parameters of the CyclicLR scheduler? Furthermore, if I use the CyclicLR scheduler (i.e. I return it along with my optimizer from configure_optimizers), won't the LR sweep also be using that CLR scheduler as well (which I don't think I want)?

(I'm new to pytorch-lightning, so I'm guessing that I'm just missing something obvious and that this is already set up to work easily. )

kswamy15 · 2020-08-18T02:55:31Z

The way I implemented one-cycle is by completely overwriting lightning's base scheduler implementation. Basically I add a step_on_batch attribute to every scheduler, which is set to True for schedulers that need to be updated at every batch (like OneCycleLR). I then store the scheduler as an attribute of the model, and use hooks to update it, like:
    def on_batch_end(self):
        if self.sched is not None and self.sched.step_on_batch:
            self.sched.step()
    
    def on_epoch_end(self):
        if self.sched is not None and not self.sched.step_on_batch:
            self.sched.step()
Note that you can also use hooks to reset things between two phases of training.

Can you elaborate on this further.
I have my optimizer like this:
def configure_optimizers(self):

        # REQUIRED
        # can return multiple optimizers and learning_rate schedulers
        # (LBFGS it is automatically supported, no need for closure function)
        optimizer = torch.optim.Adam([p for p in self.parameters() if p.requires_grad], lr=self.hparams.learning_rate, eps=1e-08)
        scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=2e-4, total_steps=1000)
        return [optimizer], [scheduler]

so should I declare self.sched = scheduler here in this function? How do I add a 'step_on_batch' attribute to the scheduler here?
Thanks in advance for you help.

kswamy15 · 2020-08-18T03:45:12Z

I used the code above for on_batch_end and on_epoch_end and was able to change the learning rate every batch. I used a print statement in the on_batch_end to verify that it did change. So this hack works.
for group in self.optim.param_groups: print('learning rate', group['lr'])

tbenst · 2020-09-14T22:41:52Z

@teichert did you come up with a solution / "best practice" for using the lr finder with one cycle? Appreciate any tips or pointers!

Edit: my current understanding is that the auto_lr_finder is not currently appropriate for cyclic learning rate as it currently fits in the middle of the lr range. Instead, we need to calculate the base and max learning rate.

In this photo:

We would want (approximately) lr_base=5e-5 and lr_max=10e-3, where the just start to see convergence and see the lowest loss, respectively.

suvojit-0x55aa added feature Is an improvement or enhancement help wanted Open to be worked on labels Dec 13, 2019

suvojit-0x55aa mentioned this issue Dec 31, 2019

Add feature in pytorch-lightning to implement one cycle policy for better hyperparam search suvojit-0x55aa/3DMMasSTN-Pytorch#6

Open

2 tasks

williamFalcon added this to the 0.6.1 milestone Feb 11, 2020

williamFalcon modified the milestones: 0.7.0, 0.7.1 Mar 3, 2020

SkafteNicki mentioned this issue Apr 2, 2020

Learning Rate finder #1347

Merged

5 tasks

Borda modified the milestones: 0.7.2, 0.7.3 Apr 8, 2020

williamFalcon closed this as completed in #1347 Apr 10, 2020

Borda modified the milestones: 0.7.4, v0.7.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cyclic learning rate finder as a part of Trainer #624

Cyclic learning rate finder as a part of Trainer #624

suvojit-0x55aa commented Dec 13, 2019

williamFalcon commented Dec 13, 2019

suvojit-0x55aa commented Dec 13, 2019 •

edited

Loading

FrancescoSaverioZuppichini commented Dec 13, 2019

suvojit-0x55aa commented Dec 13, 2019

FrancescoSaverioZuppichini commented Dec 13, 2019

williamFalcon commented Dec 13, 2019

FrancescoSaverioZuppichini commented Dec 13, 2019 •

edited

Loading

suvojit-0x55aa commented Dec 13, 2019 •

edited

Loading

williamFalcon commented Dec 13, 2019

Borda commented Dec 13, 2019

neggert commented Dec 13, 2019

williamFalcon commented Dec 13, 2019

jeffling commented Dec 13, 2019 •

edited

Loading

suvojit-0x55aa commented Dec 16, 2019

FrancescoSaverioZuppichini commented Dec 16, 2019

williamFalcon commented Feb 11, 2020

DrClick commented Feb 21, 2020

suvojit-0x55aa commented Feb 21, 2020

DrClick commented Feb 21, 2020

Borda commented Feb 23, 2020

DrClick commented Feb 24, 2020

suvojit-0x55aa commented Feb 25, 2020

DrClick commented Mar 5, 2020 •

edited

Loading

FrancescoSaverioZuppichini commented Mar 7, 2020 via email •

edited by Borda

Loading

schwobr commented Mar 9, 2020

teichert commented May 21, 2020

kswamy15 commented Aug 18, 2020 •

edited by Borda

Loading

kswamy15 commented Aug 18, 2020

tbenst commented Sep 14, 2020 •

edited

Loading

Cyclic learning rate finder as a part of Trainer #624

Cyclic learning rate finder as a part of Trainer #624

Comments

suvojit-0x55aa commented Dec 13, 2019

🚀 Feature

Motivation

Pitch

williamFalcon commented Dec 13, 2019

suvojit-0x55aa commented Dec 13, 2019 • edited Loading

FrancescoSaverioZuppichini commented Dec 13, 2019

suvojit-0x55aa commented Dec 13, 2019

FrancescoSaverioZuppichini commented Dec 13, 2019

williamFalcon commented Dec 13, 2019

FrancescoSaverioZuppichini commented Dec 13, 2019 • edited Loading

suvojit-0x55aa commented Dec 13, 2019 • edited Loading

williamFalcon commented Dec 13, 2019

Borda commented Dec 13, 2019

neggert commented Dec 13, 2019

williamFalcon commented Dec 13, 2019

jeffling commented Dec 13, 2019 • edited Loading

suvojit-0x55aa commented Dec 16, 2019

FrancescoSaverioZuppichini commented Dec 16, 2019

williamFalcon commented Feb 11, 2020

DrClick commented Feb 21, 2020

suvojit-0x55aa commented Feb 21, 2020

DrClick commented Feb 21, 2020

Borda commented Feb 23, 2020

DrClick commented Feb 24, 2020

suvojit-0x55aa commented Feb 25, 2020

DrClick commented Mar 5, 2020 • edited Loading

FrancescoSaverioZuppichini commented Mar 7, 2020 via email • edited by Borda Loading

schwobr commented Mar 9, 2020

teichert commented May 21, 2020

kswamy15 commented Aug 18, 2020 • edited by Borda Loading

kswamy15 commented Aug 18, 2020

tbenst commented Sep 14, 2020 • edited Loading

suvojit-0x55aa commented Dec 13, 2019 •

edited

Loading

FrancescoSaverioZuppichini commented Dec 13, 2019 •

edited

Loading

suvojit-0x55aa commented Dec 13, 2019 •

edited

Loading

jeffling commented Dec 13, 2019 •

edited

Loading

DrClick commented Mar 5, 2020 •

edited

Loading

FrancescoSaverioZuppichini commented Mar 7, 2020 via email •

edited by Borda

Loading

kswamy15 commented Aug 18, 2020 •

edited by Borda

Loading

tbenst commented Sep 14, 2020 •

edited

Loading