-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reliable flush/close on streams, and avoid zombies #17522
Conversation
LIBUV_BRANCH=julia-uv1.9.0 | ||
LIBUV_SHA1=ecbd6eddfac4940ab8db57c73166a7378563ebd3 | ||
LIBUV_BRANCH=jn/pid-zero | ||
LIBUV_SHA1=13c00a4268a16d5a12f3cd15b3a40c683e575dfc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before merging this should go on the branch, and checksums should be updated
7e9a7dc
to
1be9e76
Compare
LIBUV_BRANCH=julia-uv1.9.0 | ||
LIBUV_SHA1=ecbd6eddfac4940ab8db57c73166a7378563ebd3 | ||
LIBUV_BRANCH=jn/pid-zero | ||
LIBUV_SHA1=cb6d0f875a5b8ca30cba45c0c1ef7442c87c1e68 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before merging this should go on the branch, and checksums should be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also mark this as wip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not WIP, it's simply waiting for review across two repos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
work remains to be done, sounds "in progress" to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not what the WIP tag is intended to convey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's exactly what the WIP tag is intended to convey
The convention is to prefix the pull request title with "WIP:" for Work In Progress, or "RFC:" for Request for Comments when work is completed and ready for merging. This will prevent accidental merging of work that is in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the work is done and ready for merging. this just requires slightly more manual work than the usual Github button. however, we don't consider the merging "work" in the WIP tag.
(aside: I find it strange when there is both a reply and reaction on a comment. aren't they conveying redundant information?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this PR is not ready for merging. in what sense is "requires slightly more manual work" not precisely a form of "work" that should not be accidentally merged in its current state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way that #17579 was not WIP, even though it was not merged in its PR state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was a squash and the diff as pushed to master was identical to the pr state
bump @Keno (or maybe @StefanKarpinski?). CI failure were unrelated |
ccall(:jl_close_uv, Void, (Ptr{Void},), stream.handle) | ||
stream.status = StatusClosing | ||
end | ||
if uv_handle_data(stream) != C_NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation is this checking for and why was it not needed previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because of the changed order in uvfinalize
So the problem is that libuv does not reap zombies if we close the handle before the process has exited? |
right. if we close the libuv handle, then libuv has no way to track when the process dies and reap the zombie. so we need to instead ensure the libuv handle is valid for as long as its pid is non-zero. |
Not that our relationship with libuv upstream is all that great right now, but does this fix a bug that they might care about? |
Probably not, nodejs don't expose a |
1be9e76
to
4ba3790
Compare
@@ -1,2 +1,2 @@ | |||
LIBUV_BRANCH=julia-uv1.9.0 | |||
LIBUV_SHA1=ecbd6eddfac4940ab8db57c73166a7378563ebd3 | |||
LIBUV_SHA1=cb6d0f875a5b8ca30cba45c0c1ef7442c87c1e68 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs checksums updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and delete the old
389e2eb
to
8822ccb
Compare
while investigating #14434, I noticed that the test code there is creating zombies to fix that, we need to avoid calling uv_close in the finalizer until after the process has exited (or during atexit)
8822ccb
to
67efdcf
Compare
This is causing spawn test failures https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.5013/job/ey1xj2tloe8x47lt. Can reproduce locally. |
reverting on release-0.5 for rc1 |
That test is stronger than essential, so it may need another |
I'll try that. |
If you meant diff --git a/test/spawn.jl b/test/spawn.jl
index 0040630..0cc4184 100644
--- a/test/spawn.jl
+++ b/test/spawn.jl
@@ -287,7 +287,11 @@ let out = Pipe(), echo = `$exename --startup-file=no -e 'print(STDOUT, " 1\t", r
@test_throws ArgumentError write(out, "now closed error")
@test isreadable(out)
@test !iswritable(out)
- is_windows() && Base.process_events(false) # should be enough steps to fully propagate EOF now
+ if is_windows()
+ # should be enough steps to fully propagate EOF now
+ Base.process_events(false)
+ Base.process_events(false)
+ end
@test !isopen(out)
end
wait(ready) # wait for writer task to be ready before using `out` then no that didn't help |
Maybe |
This caused ZMQ.jl's tests to freeze on Windows. I'm even more glad I reverted this, and won't be backporting it until that gets addressed. |
Is this supposed be some sort of guessing game? Usually bug reports should contain a little more context and a little less snark But if I have to guess, I would point to the bug in the following test: |
Have no context yet, have identified the commit that introduced the problem but not yet identified which line was freezing. Sorry I'm extremely frustrated that we're trying to put out a release and last-minute bug fixes have been causing as many issues as they fix. |
…flush-close"" This reverts commit 58a14f3.
this makes it so that our
close()
function waits for the shutdown to finish before returning (#14434)@Keno for review of libuv changes