-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
booktest: add hangcheck timer to print current file+line, and later backtrace #4504
Draft
benlorenz
wants to merge
12
commits into
master
Choose a base branch
from
bl/bookhangcheck
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+92
−239
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4504 +/- ##
==========================================
- Coverage 84.55% 84.55% -0.01%
==========================================
Files 672 672
Lines 88831 88833 +2
==========================================
Hits 75109 75109
- Misses 13722 13724 +2 |
bf311d3
to
f2ee5b6
Compare
Needs further work since the hangcheck doesn't trigger in the code where it was supposed to help. to be continued... |
…acktrace also add @debug to print input and output
1a5ad37
to
7e44321
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To help with debugging #4493, cc @fingolfin:
When the same example file is running for 10 minutes it will try to print the current file+line, and then again every 5 min.
Once at > 20 min (i.e. 25 min) it will try to send a
USR1
/INFO
signal to the process (with a short delay) to get a backtrace of the currently running computation.Edit: added a commit to reduce the timeouts for testing, will be removed laterAll this will only work if the code has some yield points to allow the task to run, if this doesn't work we need to move this to a separate thread / process.
With the current timings, the hangcheck will probably trigger once on a successful run:
Edit2:
In the testrun (with shorter limits) a backtrace is printed here:
https://github.com/oscar-system/Oscar.jl/actions/runs/12986964827/job/36214891463?pr=4504#step:11:6489
And a hangcheck warning here:
https://github.com/oscar-system/Oscar.jl/actions/runs/12986964827/job/36214891463?pr=4504#step:11:752
I am slightly confused about the timing of the backtrace, I think it should have been printed directly after the +3min hangcheck and not shortly before the +4min, but I don't think that this is important.