-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GPUStatsMonitor to improve training speed #3257
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3257 +/- ##
=======================================
+ Coverage 86% 89% +3%
=======================================
Files 91 92 +1
Lines 8152 8311 +159
=======================================
+ Hits 6989 7396 +407
+ Misses 1163 915 -248 |
Questions:
|
Yes
Preferably |
@rohitgr7 have we also renamed the files and class in #3008 so we need to maintain back-compatibility if it was already released in 0.9... check #3251 (comment) |
Hello @rohitgr7! Thanks for updating this PR.
Comment last updated at 2020-09-03 23:19:42 UTC |
@rohitgr7 is it still wip? |
This pull request is now in conflict... :( |
952a39f
to
83b5b16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main speedup comes from less logging calls, is that correct?
yep and also from single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good that the deprecation/backward compatibility is included now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -69,7 +83,7 @@ def __init__( | |||
): | |||
super().__init__() | |||
|
|||
if shutil.which("nvidia-smi") is None: | |||
if shutil.which('nvidia-smi') is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not work on Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you consider to use pynvml instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get it to work on my windows machine. Just had to add nvidia-smi to PATH variable, and restart. On my computer it is stored in: C:\Program Files\NVIDIA Corporation\NVSMI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but this is something that user should do manually
maybe in that case, also add smth like so?
if any([shutil.which('C:/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi') is None,
shutil.which('nvidia-smi') is None])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work in most cases, as long as the user has install in default location.
Maybe it should just be clarified in the docs, that the user needs to be able to call nvidia-smi
from terminal and hint what windows users should do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can add it manually into PATH (os.environ["PATH"]), but i don't know is if a valid behavior
What does this PR do?
Calls
nvidia-smi
once with all the required stats to monitor.Follow up of #3008.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃