-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cyclic learning rate finder as a part of Trainer #624
Comments
this should be a learning rate scheduler no? |
This is not a scheduler per se, but it helps to find a good learning rate to start with that can be used with other optimizers as wells as helpful for finding Here is another article regarding this. |
something like https://docs.fast.ai/callbacks.lr_finder.html right? |
@FrancescoSaverioZuppichini yes. Since |
I totally agree. Maybe it can be easy copied directly from fastai. |
let’s not copy anything from fast.ai. I’d rather be able to import from fast.ai and use it. I’d like Lightning to work well with the other libraries. so, the flow should be to allow support for this component from fast.ai and maybe generalize a bit to enable other components to work with lightning |
IMHO it doesn't make any sense to force the user to install fastai only to use a subfeature that can (and should) be present in lighting. Lighting should replace other libraries like fastai. For example, I don't like fastai, the code base is not great and the doc is terrible, I would like to avoid installing it again on my machine to just use one feature. |
@FrancescoSaverioZuppichini @williamFalcon lightning works great as the light-weight wrapper it is. it provides flexibility as well as extensibility. I suggested this feature cause it requires a few components to work like the optimizer, dataloader and the model; in Trainer we have all of those in the same place, and the technique is proven to work quite well in practice, so we can take inspiration form libraries like fast.ai, and the Pytorch implemetanion here as well as this keras implementaion here to implement it in lightning. |
totally agree, if they make some correction we would have it too and do not need to dig what is wrong again...
agree, there is already use for
maybe something like we did with the logger, that there is an abstract class and then implement this LR |
My only strong opinion is that we should not include fastai as a dependency, mostly because fastai has a ton of very heavy dependencies itself that would get pulled in. |
it wouldn’t be a dependency. i mean the ability to work with other libraries. take the approach we took with mlflow as borda suggested |
I actually did a bit of research into this and implemented this at work. It's actually very easy. fastai's implementation just does a small run while tracking learning rate and loss, and then prints out the chart. They also have an option for finding the 'optimal' learning rate, but it's different for every use-case so even in the course they look at the graph and do it intuitively. The easiest way to implement this with lightning that I can think of:
Regarding using fast.ai, I don't think it would be possible to just use it, as the user would need to have a fast.ai model as well. Maybe there is an adapter we can provide in the future.
Regarding the optimizer, dataloader, model, I think we don't need any improvements there as you can do everything with Trainer already. BUT, we currently do not have the ability to You can easily work around this by keeping a reference to the scheduler and stepping yourself, but lightning could also add this functionality. TLDR: work needed for this:
|
@williamFalcon @jeffling should we track the feature of stepping LR schedulers at every iteration in a separate issue ? |
@jeffling sounds great, I think it should be easy to add it to the trainer |
@FrancescoSaverioZuppichini @suvojit-0x55aa want to implement for lightning and submit a PR? Would be great to get this into the next release. |
I have tried doing this in the training_step
which does step the scheduler, however, when trying to use the torch.optim.lr_scheduler.OneCycleLR where number of iterations spans epochs, the learning rate is reset at each epoch to the base learning rate. Does anyone know how I can stop this from happening? For oneCycle, the length of the cycle is normally the entire set of epochs you want to train so reseting at the start of each epoch breaks this. |
@suvojit-0x55aa thank you, I am referencing this
I have done that with my code snippet, however, at each epoch, the learning rate is reset to the base learning rate for some reason. This results in the learning rate be truncated at every epoch. I suspect this has to do with the call to optimizer.step() and possibly I am changing the learning rate of scheduler out of context... Any thoughts? Additionally, at the end of each epoch, in the training_loop.py, it calls scheduler.step which adds num_gpus extra steps. |
@FrancescoSaverioZuppichini @suvojit-0x55aa @DrClick any thought on the implementation? |
I am still looking into why this happens. I am happy to make a PR when I find a solution. I think this is a pretty critical issue, it flatly goes against the 1st stated design principle of "no pytorch interference" |
I have found the issue and have a solution but I would like to discuss the possible solutions. I am happy to submit a PR. Basically, the training loop calls This however leads to my second problem, I got this to work(by catching the call to my scheduler and ignoring calls where the epoch is provided) , but what I cannot find is a clean way train for a while, stop training, change the learning rate schedulers, and continue training. I have worked around it with the following, but this is clearly pretty unsatisfactory. I was wondering on your thoughts about this and if interested I can start an issue and work on it.
|
And this library should make things easier 😂
|
The way I implemented one-cycle is by completely overwriting lightning's base scheduler implementation. Basically I add a def on_batch_end(self):
if self.sched is not None and self.sched.step_on_batch:
self.sched.step()
def on_epoch_end(self):
if self.sched is not None and not self.sched.step_on_batch:
self.sched.step() Note that you can also use hooks to reset things between two phases of training. |
Thanks for this feature! The Leslie Smith paper recommends using the LR sweep to choose lower and upper bounds (a.k.a. (I'm new to pytorch-lightning, so I'm guessing that I'm just missing something obvious and that this is already set up to work easily. ) |
Can you elaborate on this further.
so should I declare self.sched = scheduler here in this function? How do I add a 'step_on_batch' attribute to the scheduler here? |
I used the code above for on_batch_end and on_epoch_end and was able to change the learning rate every batch. I used a print statement in the on_batch_end to verify that it did change. So this hack works. |
@teichert did you come up with a solution / "best practice" for using the lr finder with one cycle? Appreciate any tips or pointers! Edit: my current understanding is that the auto_lr_finder is not currently appropriate for cyclic learning rate as it currently fits in the middle of the lr range. Instead, we need to calculate the base and max learning rate. We would want (approximately) lr_base=5e-5 and lr_max=10e-3, where the just start to see convergence and see the lowest loss, respectively. |
🚀 Feature
Learning rate finder to plot
lr vs loss
relationship forTrainer
and find a good starting learning rate.Motivation
Cyclical Learning Rates for Training Neural Networks by Leslie N. Smith documents how to find a good learning rate for training with CyclicLR scheduler.
Pitch
Adding a methods to the Trainer class:
find()
: Runs the CLR finder and plots the graph in Logger.The text was updated successfully, but these errors were encountered: