Fix amp tests #661

neggert · 2020-01-03T17:30:05Z

A few changes to fix AMP-related problems in our tests:

With opt_level="O1" (the default), AMP patches many torch functions, which breaks any tests that run afterwards. This patch introduces a pytest extension that lets tests be marked with @pytest.mark.spawn so that they are run in their own process using torch.multiprocessing.spawn so that the main python interpreter stays un-patched.
Since AMP defaults to O1 now, DP tests no longer throw exceptions. Remove the pytest.raises that expects the exception.
Since AMP patches torch functions, CPU inference no longer works.
Skip prediction step for AMP tests.

Note that there are still a couple of unrelated test failures that will be addressed in another PR.

With opt_level="O1" (the default), AMP patches many torch functions, which breaks any tests that run afterwards. This patch introduces a pytest extension that lets tests be marked with @pytest.mark.spawn so that they are run in their own process using torch.multiprocessing.spawn so that the main python interpreter stays un-patched. Note that tests using DDP already run AMP in its own process, so they don't need this annotation.

Since AMP defaults to O1 now, DP tests no longer throw exceptions. Since AMP patches torch functions, CPU inference no longer works. Skip prediction step for AMP tests.

neggert · 2020-01-03T19:14:32Z

Travis failure appears to be due to this unrelated issue: pytorch/vision#1718

jeffling

Awesome stuff :)

Borda

NIce job 🚀

Borda · 2020-01-04T15:32:29Z

tests/test_amp.py

@@ -124,26 +128,6 @@ def test_amp_gpu_ddp_slurm_managed(tmpdir):
    assert trainer.resolve_root_node_address('abc[23-24]') == 'abc23'
    assert trainer.resolve_root_node_address('abc[23-24, 45-40, 40]') == 'abc23'

-    # test model loading with a map_location


maybe just comment it?

Borda · 2020-01-04T15:40:30Z

tests are failing on PIL, I found somewhere that adding Pollow>=4.2 solves the issue

neggert · 2020-01-04T20:58:31Z

Supposedly torchvision will do a bug fix release early next week (0.4.2?) that will resolve the Pillow issue. The fix is already in torchvision master. I’m inclined to just wait for that and update our minimum torchvision version. On Jan 4, 2020, at 9:40 AM, Jirka Borovec <[email protected]> wrote: tests are failing on PIL, I found somewhere that adding Pollow>=4.2 solves the issue * python-pillow/Pillow#4130<python-pillow/Pillow#4130> * https://pillow.readthedocs.io/en/stable/releasenotes/7.0.0.html#pillow-version-constant — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#661>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAJWJZW4YFKSU5U5B25OEQTQ4CUW7ANCNFSM4KCP5IUA>.

hugovk · 2020-01-16T08:44:11Z

torchvision v0.5.0 has been released with the fix:

Require torchvision>=0.5.0
If Pillow was temporarily pinned, remove the pin

Borda · 2020-01-16T08:57:43Z

but torchvision>=0.5.0 has some JIT issues... #694

neggert added 3 commits January 3, 2020 11:10

Fix AMP tests

2292595

Since AMP defaults to O1 now, DP tests no longer throw exceptions. Since AMP patches torch functions, CPU inference no longer works. Skip prediction step for AMP tests.

typo

2a0c678

jeffling approved these changes Jan 3, 2020

View reviewed changes

Borda approved these changes Jan 4, 2020

View reviewed changes

Borda mentioned this pull request Jan 5, 2020

Sacred logger #656

Closed

4 tasks

williamFalcon merged commit 019f612 into Lightning-AI:master Jan 5, 2020

bryant1410 mentioned this pull request Mar 5, 2020

Fix trainer tests in GPU allenai/allennlp#3903

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix amp tests #661

Fix amp tests #661

neggert commented Jan 3, 2020

neggert commented Jan 3, 2020

jeffling left a comment

Borda left a comment

Borda Jan 4, 2020

Borda commented Jan 4, 2020

neggert commented Jan 4, 2020 via email

hugovk commented Jan 16, 2020

Borda commented Jan 16, 2020

Fix amp tests #661

Fix amp tests #661

Conversation

neggert commented Jan 3, 2020

neggert commented Jan 3, 2020

jeffling left a comment

Choose a reason for hiding this comment

Borda left a comment

Choose a reason for hiding this comment

Borda Jan 4, 2020

Choose a reason for hiding this comment

Borda commented Jan 4, 2020

neggert commented Jan 4, 2020 via email

hugovk commented Jan 16, 2020

Borda commented Jan 16, 2020