Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReduceLROnPlateau does not work with multiple schedulers #1037

Closed
ghost opened this issue Mar 3, 2020 · 12 comments
Closed

ReduceLROnPlateau does not work with multiple schedulers #1037

ghost opened this issue Mar 3, 2020 · 12 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@ghost
Copy link

ghost commented Mar 3, 2020

🐛 Bug

PL seems to only pull one ReduceLROnPlateau to call it the reduce_lr_on_plateau_scheduler

To Reproduce

One example would be adding two ReduceLROnPlateau schedulers to the GAN example.

Expected behavior

All ReduceLROnPlateau should be recognized as such.
A natural solution might be to check if a scheduler is an instance of ReduceLROnPlateau for each scheduler in the loop that runs lr_scheduler.step(), instead of having a separate self.reduce_lr_on_plateau_scheduler

@ghost ghost added bug Something isn't working help wanted Open to be worked on labels Mar 3, 2020
@williamFalcon
Copy link
Contributor

williamFalcon commented Mar 3, 2020

Good catch!
mind submitting a PR?

@SkafteNicki
Copy link
Member

If PR #941 is merged, it will also solve this problem

@Borda
Copy link
Member

Borda commented Mar 4, 2020

@darwinkim @SkafteNicki pls coordinate your effort #941 #1039 so none of your work gets wasted... ;] maybe make #1039 as followup of #941?

@ghost
Copy link
Author

ghost commented Mar 4, 2020

#941 seems to solve it, and adds many related features

@SkafteNicki
Copy link
Member

With #941 being merged now, this can be closed

@Borda Borda closed this as completed Mar 5, 2020
@Laksh1997
Copy link

I'm not sure if this is solved. I just tried 2 schedulers with 1 optimizer and it didn't work.

When I remove my linear scheduler, ReduceLROnPlateau starts to work again.

@SkafteNicki
Copy link
Member

@Laksh1997 could you provide more info:which version of lightning are you using, what does your configure optimizer looks like?

@Laksh1997
Copy link

@SkafteNicki

Latest stable (0.8.1)

Using ReduceLROnPlateau at the epoch level and a linear warmup on the step level:

import torch


class LinearWarmupScheduler(torch.optim.lr_scheduler.LambdaLR):
    def __init__(
        self, optimizer: torch.optim.Optimizer, num_warmup_steps: int = 1000,
    ):
        assert num_warmup_steps > 0
        super().__init__(optimizer, lambda step: min(step / num_warmup_steps, 1.0))

@SkafteNicki
Copy link
Member

Okay, but what does configure_optimizers method of your model look like?

@Laksh1997
Copy link

@SkafteNicki

It returns the following:

[torch.optim.Adam(self.parameters(), lr=0.01)], [{"scheduler": ReduceLROnPlateau(...), "interval": "epoch"}, {"scheduler": LinearWarmupScheduler(...), "interval": "step"}]

@Laksh1997
Copy link

Laksh1997 commented Jun 27, 2020

Here is my code more specifically:

def build_scheduler_params(scheduler_name, param_set, optimizer: Optimizer):
    """Parses scheduler params"""
    pl_scheduler_params = {
        "monitor": param_set.pop("monitor", "val_loss"),
        "interval": param_set.pop("interval", "epoch"),
        "frequency": param_set.pop("frequency", 1),
    }
    if hasattr(torch.optim.lr_scheduler, scheduler_name):
        scheduler_class = getattr(torch.optim.lr_scheduler, scheduler_name)
        scheduler = scheduler_class(optimizer, **param_set)
    elif hasattr(lr_schedulers, scheduler_name):
        scheduler_class = getattr(lr_schedulers, scheduler_name)
        scheduler = scheduler_class(optimizer, **param_set)
    else:
        raise ValueError(
            f"Scheduler: {scheduler_name} not available. "
            f"Schedulers available are: {get_available_schedulers()}"
        )
    pl_scheduler_params["scheduler"] = scheduler
    return pl_scheduler_params


def build_optimizer(model: nn.Module, config) -> Optimizer:
    """Makes optimizer from model and config"""
    optim_kwargs = copy.deepcopy(config.optimizer_kwargs)
    optim_class = get_optim_class(config.optimizer)
    optim_param_groups = build_optim_param_groups(model, optim_kwargs)
    check_valid_param_groups(optim_param_groups, model)
    if isinstance(optim_kwargs, list):
        optim_kwargs = optim_kwargs[0]
    optimizer = optim_class(optim_param_groups, **optim_kwargs)
    return optimizer


def build_schedulers(optimizer: Optimizer, config):
    """Makes schedulers from optimizer and config"""
    schedulers = []
    schedulers_names = config.schedulers
    schedulers_kwargs = config.schedulers_kwargs

    assert len(schedulers_names) == len(schedulers_kwargs), (
        f"Need to have as many schedulers as scheduler param sets! "
        f"Got {len(schedulers_names)} of schedulers and "
        f"{len(schedulers_kwargs)} of scheduler param sets!"
    )
    if schedulers_names is not None:
        for scheduler_name, param_set in zip(schedulers_names, schedulers_kwargs):
            pl_scheduler_params = build_scheduler_params(
                scheduler_name, param_set, optimizer
            )
            schedulers.append(pl_scheduler_params)
    return schedulers


def configure_optimizers(model: nn.Module, config) -> PL_EXPECTED_OUTPUT:
    """Configures PL optimizers and schedulers"""
    optimizer = build_optimizer(model, config)
    schedulers = build_schedulers(optimizer, config)
    return [optimizer], schedulers

@SkafteNicki
Copy link
Member

@Laksh1997 I do not really see a problem in your code. After experimenting around with ReduceLROnPlateau scheduler in combination with other schedulers, I don't think the there is a problem in lightning either as I can get multiple schedulers to work at the same time.

My best guess is that this is specific to your model/data. When you have two optimizers, and one is changing the learning rate every step it is very possible that you do not reach a plateau, such that the ReduceLROnPlateau scheduler never kicks in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants