-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding NVIDIA-SMI like information #2074
Comments
Hi! thanks for your contribution!, great first issue! |
add 1) you can use Batch finder |
If your goal is just to optimize for batch size, then the batch finder may be what you are looking for. |
Hi @SkafteNicki @Borda , |
@groadabike mind send a PR with PL callback? |
Hi @Borda , I try to use the gpumonitor callback but it didn't work in my HPC. |
@groadabike i think that looks like a great addition. I you want to submit a PR, feel free :) |
closing this as it was solved by PR #2932 |
🚀 Feature
Motivation
Most of the research is done on HPC. Therefore, if I want to see the GPU RAM and usage of my job, I have to open a secondary screen to run "watch nvidia-smi" or "nvidia-smi dmon".
Have this info saved in the logs will help to:
Pitch
When training starts, report the GPU RAM and the GPU usage together with loss and v_num
Alternatives
After the first epoch is loaded into the GPU, log the GPU RAM and the GPU usage
Additional context
The text was updated successfully, but these errors were encountered: