-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix amp tests #661
Fix amp tests #661
Conversation
With opt_level="O1" (the default), AMP patches many torch functions, which breaks any tests that run afterwards. This patch introduces a pytest extension that lets tests be marked with @pytest.mark.spawn so that they are run in their own process using torch.multiprocessing.spawn so that the main python interpreter stays un-patched. Note that tests using DDP already run AMP in its own process, so they don't need this annotation.
Since AMP defaults to O1 now, DP tests no longer throw exceptions. Since AMP patches torch functions, CPU inference no longer works. Skip prediction step for AMP tests.
Travis failure appears to be due to this unrelated issue: pytorch/vision#1718 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIce job 🚀
@@ -124,26 +128,6 @@ def test_amp_gpu_ddp_slurm_managed(tmpdir): | |||
assert trainer.resolve_root_node_address('abc[23-24]') == 'abc23' | |||
assert trainer.resolve_root_node_address('abc[23-24, 45-40, 40]') == 'abc23' | |||
|
|||
# test model loading with a map_location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just comment it?
tests are failing on PIL, I found somewhere that adding |
Supposedly torchvision will do a bug fix release early next week (0.4.2?) that will resolve the Pillow issue. The fix is already in torchvision master. I’m inclined to just wait for that and update our minimum torchvision version.
On Jan 4, 2020, at 9:40 AM, Jirka Borovec <[email protected]> wrote:
tests are failing on PIL, I found somewhere that adding Pollow>=4.2 solves the issue
* python-pillow/Pillow#4130<python-pillow/Pillow#4130>
* https://pillow.readthedocs.io/en/stable/releasenotes/7.0.0.html#pillow-version-constant
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#661>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAJWJZW4YFKSU5U5B25OEQTQ4CUW7ANCNFSM4KCP5IUA>.
|
torchvision v0.5.0 has been released with the fix:
|
but |
A few changes to fix AMP-related problems in our tests:
With
opt_level="O1"
(the default), AMP patches many torch functions, which breaks any tests that run afterwards. This patch introduces a pytest extension that lets tests be marked with@pytest.mark.spawn
so that they are run in their own process usingtorch.multiprocessing.spawn
so that the main python interpreter stays un-patched.Since AMP defaults to O1 now, DP tests no longer throw exceptions. Remove the
pytest.raises
that expects the exception.Since AMP patches torch functions, CPU inference no longer works.
Skip prediction step for AMP tests.
Note that there are still a couple of unrelated test failures that will be addressed in another PR.