-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start accumulate gradients schedule at epoch 0 #2490
Start accumulate gradients schedule at epoch 0 #2490
Conversation
* added tpu_id added tpu_id to mixins * train on individual tpu * parallel loader if tpu_id is None * removed progress_bar_refresh_rate * chlog * replaced num_tpu_cores with tpu_cores * set tpu_id to None if int * changed num_tpu_cores to tpu_cores in docs * updated docs * updated __init__.py removed self.tpu_id for ParallelLoader * Update pytorch_lightning/trainer/__init__.py * check if tpu_cores is a list Co-authored-by: Jirka Borovec <[email protected]> * xla device conditional * num_tpu_cores deprecation * removed duplicate warning * fixed pep8 error * Revert "removed duplicate warning" This reverts commit 8adb0a9 * deprecated api update * fixed recursion error * fixed tests * fixed flake errors * removed current_tpu_index * Update CHANGELOG.md * Update trainer.py Co-authored-by: Jirka <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: William Falcon <[email protected]>
* fix codecov * upgrade codecov * upgrade codecov
* Remove NaNs from loss in LRFinder * np.isfinite * chlog * add test * chlog Co-authored-by: Jirka <[email protected]>
* New metric classes (#1326) * Create metrics package * Create metric.py * Create utils.py * Create __init__.py * add tests for metric utils * add docstrings for metrics utils * add function to recursively apply other function to collection * add tests for this function * update test * Update pytorch_lightning/metrics/metric.py Co-Authored-By: Jirka Borovec <[email protected]> * update metric name * remove example docs * fix tests * add metric tests * fix to tensor conversion * fix apply to collection * Update CHANGELOG.md * Update pytorch_lightning/metrics/metric.py Co-Authored-By: Jirka Borovec <[email protected]> * remove tests from init * add missing type annotations * rename utils to convertors * Create metrics.rst * Update index.rst * Update index.rst * Update pytorch_lightning/metrics/convertors.py Co-Authored-By: Jirka Borovec <[email protected]> * Update pytorch_lightning/metrics/convertors.py Co-Authored-By: Jirka Borovec <[email protected]> * Update pytorch_lightning/metrics/convertors.py Co-Authored-By: Jirka Borovec <[email protected]> * Update pytorch_lightning/metrics/metric.py Co-Authored-By: Jirka Borovec <[email protected]> * Update tests/utilities/test_apply_to_collection.py Co-Authored-By: Jirka Borovec <[email protected]> * Update tests/utilities/test_apply_to_collection.py Co-Authored-By: Jirka Borovec <[email protected]> * Update tests/metrics/convertors.py Co-Authored-By: Jirka Borovec <[email protected]> * Apply suggestions from code review Co-Authored-By: Jirka Borovec <[email protected]> * add doctest example * rename file and fix imports * added parametrized test * replace lambda with inlined function * rename apply_to_collection to apply_func * Separated class description from init args * Apply suggestions from code review Co-Authored-By: Jirka Borovec <[email protected]> * adjust random values * suppress output when seeding * remove gpu from doctest * Add requested changes and add ellipsis for doctest * forgot to push these files... * add explicit check for dtype to convert to * fix ddp tests * remove explicit ddp destruction Co-authored-by: Jirka Borovec <[email protected]> * move dtype device mixin to more general place * refactor to general device dtype mixin * add initial metric package description * change default to none for mac os * pep8 * fix import * Update index.rst * Update ci-testing.yml * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> * Update CHANGELOG.md * Update pytorch_lightning/metrics/converters.py * readme * Update metric.py * Update pytorch_lightning/metrics/converters.py Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: William Falcon <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Jirka <[email protected]>
* Allow dataloaders without sampler field present Sometimes we have a custom dataloader that doesn't have a sampler, better to check that the field is there before reading it. * chlog Co-authored-by: Jirka <[email protected]>
* suppress epub warn * fail on warn * suppress epub warn * fail on warn * disable epub * typo * remove obsolete suppress
* set min PT 1.3 * circleCI * mergify * min * chlog * skip
* fix user error produced by apex + scheduler combination * add changelog * added reinit to every configure_apex call * fix styling Co-authored-by: Nicki Skafte <[email protected]>
* readme * add governance.rst Co-authored-by: Nicki Skafte <[email protected]>
* remove the need for hparams * remove the need for hparams * remove the need for hparams * remove the need for hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * finished moco * basic * testing * todo * recurse * hparams * persist * hparams * chlog * tests * tests * tests * tests * tests * tests * review * saving * tests * tests * tests * docs * finished moco * hparams * review * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> * hparams * overwrite * transform * transform * transform * transform * cleaning * cleaning * tests * examples * examples * examples * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> * chp key * tests * Apply suggestions from code review * class * updated docs * updated docs * updated docs * updated docs * save * wip * fix * flake8 Co-authored-by: Jirka <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]>
* update logger imports * pep8 fixes * pep8
* tests, fix logger bug and prepare data bug * add CHANGELOG.md Co-authored-by: Nicki Skafte <[email protected]>
* saves model every epoch * implement test for save_last * Update CHANGELOG.md * Update CHANGELOG.md * changes test description Co-authored-by: Jeremy Jordan <[email protected]> Co-authored-by: Jeremy Jordan <[email protected]>
* wip protected progress bar settings * remove callback attr from LRfinder * whitespace * changelog
* Fixes #490 `EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`. In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`. * Highlighted that ES callback runs on val epochs in docstring * Updated EarlyStopping in rst doc * Update early_stopping.py * Update early_stopping.rst * Update early_stopping.rst * Update early_stopping.rst * Update early_stopping.rst * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> * Update docs/source/early_stopping.rst * fix doctest indentation warning * Train loop calls early_stop.on_validation_end * chlog Co-authored-by: William Falcon <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Jirka <[email protected]>
* filter valid args * error on unknown manual args * added test * changelog * update docs and doctest * simplify * doctest * doctest * doctest * better test with mock check for init call * fstring * extend test * skip test on 3.6 not working Co-authored-by: William Falcon <[email protected]>
* updated docs * added mixed * added mixed
* fix default * formatting errors * update * flake8
* Removing unecessary early stopping calls * Update CHANGELOG.md Co-authored-by: Mateusz Pieniak <[email protected]> Co-authored-by: William Falcon <[email protected]>
* fixed undesired behaviour due to dict.fromkeys * a test for log length consistency * runtime-warn if no schedulers are configured * chlog * move Co-authored-by: Jirka <[email protected]>
* unify tests * Apply suggestions from code review Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]>
Hi, thanks for the PR. The test that you changed is for LRFinder, it looks ok, but do you have an idea why the other test |
tests/trainer/test_lr_finder.py
Outdated
assert lrfinder._total_batch_idx == 100 * 2, \ | ||
'Accumulation parameter did not work' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert lrfinder._total_batch_idx == 100 * 2, \ | |
'Accumulation parameter did not work' | |
assert lrfinder._total_batch_idx == 100 * 2, \ | |
'Accumulation parameter did not work' | |
assert trainer.total_batch_idx == 0 |
What do you think? Should we also add this assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this essentially ensure that the trainer never starts to train the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, since there is accumulation happening in trainer and lr_finder, I would just include this assert for clarity.
this test has a bug, it shall be 200, pls correct it :] |
Prior to this pull request the After this pull request the The |
@HHousen would you like to modify that test so that it would fail on master and pass here? Otherwise the error could sneak back in the future without us noticing :) I can help if you need |
Sure. I'll parametrize the test so an int can be tested as well as a dict. |
In the |
Hello @HHousen! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-07-05 00:05:47 UTC |
Nevermind, it should be 4. |
I updated the |
tests/trainer/test_trainer.py
Outdated
# for the test | ||
trainer.optimizer_step = _optimizer_step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that explains a lot :) so the test was never really running the function?
Good that you saw this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
@HHousen solid catch!! |
Sorry. I tried to remove the two dataset files that were downloaded when I ran the tests since I accidentally committed them. This did not work since force-pushing closed this pull request. I undid the force-push by reverting back to e0cb028, but I still can't reopen since GitHub says the branch was "force-pushed or recreated."
It looks like everything should have been reverted (the GitHub compare looks good). Any suggestions? @awaelchli |
I suggest you simply submit an new PR. You can set the remote branch to a new name and push your commits there. |
Can you rest your branch to master and commit the change again... Then we reopen this PR (it would be a sad to lose all this valuable discussion and contribution)
When any change appears simply reopen this PR |
I have to open a new pull request since GitHub will not let me reopen this one even if the branch is at the state right before the request was automatically closed. Moved to #2513. It is strange though since it's the same exact branch in both requests. |
It was closed because the PR state was indetical with target branch... So if there is any difference it allows you to reopen the PR... |
1 similar comment
It was closed because the PR state was indetical with target branch... So if there is any difference it allows you to reopen the PR... |
I rewrote history by accident which closed the PR because there were no commits in common with master. I then reset back to the commit directly before the rewrite, so everything was in the same state as it was before the rewrite. It would still not let me open after this. There are currently 5 additional commits on this branch, which #2513 recognizes but this PR will not. GitHub should let me reopen a PR if the branch to be merged is in the same state that it was in directly before the closure. Since it would not let me do this, I added a commit to remove the accidentally committed files and opened #2513. Both PRs are for the same branch, which should not be possible to my understanding. The only reason it is possible is because this one was closed and is no longer tracking changes. Essentially, this PR would not recognize that the state of the branch was reverted and thus would not let me reopen. I did try resetting to master and redoing the commits, but that produced identical commits and resulted in the same state as doing a |
What does this PR do?
Fixes #2480. If the
pl.Trainer
optionaccumulate_grad_batches
was an integer then the first epoch (epoch 0) would useaccumulate_grad_batches=1
while the remaining epochs would useaccumulate_grad_batches=<user_value>
. This was caused by #2289.Before:

After:

Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃