-
Notifications
You must be signed in to change notification settings - Fork 149
Improved detection of timeout based vs direct cancellation of cancel scopes #591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unfortunately this patch doesn't fix anything on my system. Tested against anyio/fix-timeout-after-success and Trio/v0.22.2. Test run:
using this code:
|
The test program does not crash (or at least not reliably enough …) if I increase T to 0.01 on my test machine, which suggests that if you tried my previous version and didn't see the problem you need to decrease T somewhat; after doing that, you might have to adjust the randomization so that both values increase at roughly the same speed. If the first value stays at zero then T is too low. |
I was not able to 100% fix this on trio because it's unable to tell the difference between timeout based cancellation vs manual cancellation + deadline exceeded later, unlike the asyncio backend where the timer expiring event is explicitly recorded and checked against. Are you able to make the test program fail with asyncio though? If so, that then would indicate a problem. |
I wrote that too soon. After reading the code, it looks like your new test snippet indeed tests against the asyncio backend too. I'll see if I can sort this out before heading to EuroPython. |
With the changes I pushed, I can no longer repro the problem with your script. Can you confirm? |
With the current changes (commit 95d1f97) there is no longer a problem with the Trio backend. However, I found a different problem with asyncio. The inner test loop terminates way too early. To reproduce:
The problem: The Thus the code, changed to
ends up emitting |
Back to the drawing board I guess |
I have played around with the code without being able to repro the problem. Would you mind pasting your entire script here with which you repro'd the new issue, so I can be sure I'm doing it right? |
Sure. Python 3.11.2, standard Debian installation. The error shows up in a new venv, so it's unlikely to be related to any strange module I installed. Removing the import sys
import anyio
import asyncio
import trio
import random
TRIO=None
T = 0.00005
N = int(0.1/T)
async def eat(r):
a=0
b=0
while True:
msg = None
try:
with (trio if TRIO==2 else anyio).fail_after(T):
msg = await r.receive()
except (trio.TooSlowError,TimeoutError):
if msg is not None:
raise RuntimeError(f"Owch {a} {b}")
b += 1
else:
a += 1
if a%N == 0 or b%N == 0:
sys.stderr.write(f" {a} {b}\r"); sys.stderr.flush()
async def main():
s,r = (trio.open_memory_channel if TRIO else anyio.create_memory_object_stream)(10)
async with (trio.open_nursery if TRIO else anyio.create_task_group)() as tg:
tg.start_soon(eat,r)
for _ in range(10*N):
try:
await (trio if TRIO else asyncio if TRIO is None else anyio).sleep(T*(random.random() * 0.7 + 0.9))
except BaseException as exc:
raise RuntimeError("What?") from exc
await s.send(True)
tg.cancel_scope.cancel()
for TRIO in (None,):
#for TRIO in (False,None,True,2):
try:
if TRIO:
trio.run(main)
else:
anyio.run(main, backend="asyncio" if TRIO is None else "trio")
except Exception as exc:
if hasattr(exc,"exceptions"):
exc = exc.exceptions[0]
raise exc
else:
print(f"OK with TRIO={TRIO}")
$ PYTHONPATH=/src/anyio/src python3 /tmp/t.py
Traceback (most recent call last):
File "/tmp/t.py", line 34, in main
await (trio if TRIO else asyncio if TRIO is None else anyio).sleep(T*(random.random() * 0.7 + 0.9))
File "/usr/lib/python3.11/asyncio/tasks.py", line 639, in sleep
return await future
^^^^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/t.py", line 51, in <module>
raise exc
File "/tmp/t.py", line 36, in main
raise RuntimeError("What?") from exc
RuntimeError: What? |
Thanks, I was now able to repro the problem locally. |
What seems to be happening is that the task is cancelled by a cancel scope, but the cancellation is only delivered to the task after a new scope has been entered. The new scope is exited with |
I checked and it seems that the stdlib |
I went to talk to Łukasz Langa here at the EP2023 sprints and agreed that I should report this as an issue. But given that this is a fundamental issue in asyncio, I don't think this should delay the 4.0 release any longer. Would you agree that this PR substantially improves the semantics even though it doesn't fully resolve all the issues? We can work on improving things in later releases. |
Ah. Frankly I already suspected that. Sure, go ahead with 4.0. This issue is not a problem for me, I'm using Trio. ;-) and even if it was it's not anyio's fault and even if you can work around it in anyio, given that it affects non-anyio code, doing so would not really help much. |
One issue I noticed on py3.11 is that when this happens, the cancel scope doesn't uncancel the host task since that is currently only triggered when a |
The tricky part is constructing a test to consistently repro this situation. |
@graingert and I both came to the conclusion that AnyIO's memory object streams are the fundamental problem here. Neither of us could reproduce the problem with asyncio queues. |
Are they? AFAICT the only async thing memory streams do is wait on (and set, and (probably most relevant here) cancel tasks waiting on) perfectly normal Thus the next step I'd consider would be to copy the relevant parts into your test program, maybe inline some of the code in question, and then boil that down to a somewhat-minimal reproducer. |
Memory object streams cheat a little bit as far as cancellation goes, see here. We suspected that this is the cause of your troubles. I'm working on an alternative solution. |
Owch. Yeah, I seem to have missed that one. I agree that you really shouldn't do that. |
That code was added to fix #146. |
After a lengthy discussion on the Gitter Trio room, I decided to back out of |
As I made some significant tweaks before merging, it might be prudent to test that |
Well, not I'm going to run my dev tests with it but I really don't expect any more surprises. This bug didn't show up in dev tests either – it raised its ugly head because my home's temperature monitors died every week with no good reason. |
Fixes #589.