Skip to content

Deadlock in AsyncConnectionPool #454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
f0ff886f opened this issue Dec 2, 2021 · 5 comments
Closed

Deadlock in AsyncConnectionPool #454

f0ff886f opened this issue Dec 2, 2021 · 5 comments

Comments

@f0ff886f
Copy link

f0ff886f commented Dec 2, 2021

httpcore==0.14.3
httpx==0.21.1
anyio==3.4.0
python==3.9.6 (asyncio runtime, not trio)

Hello, first of all I would like to apologize for not providing a repro for this bug as it is something I have hit only once but it is a very nasty hang.

I have a tool that uses httpx and issues HTTP POSTs concurrently with something like this:

async def stop(self):
    tasks = []
    for c in self._clients:
        stop_task = asyncio.create_task(c.stop())
        tasks.append(stop_task)

    results = await asyncio.gather(*tasks)

Where c.stop() is:

async def stop(self):
    try:
        response = await self._client.post("/api/stop", timeout=60)
    except Exception as e:
        raise ClientError(
            f"Error while stopping the client: {self._url} - {e}"
        ) from e

And lastly, self._client is simply: self._client = httpx.AsyncClient(base_url=url, timeout=15)

Recently, this stop sequence just stalled forever, and the clients never received a stop POST.

Digging into the running process with pyrasite-ng and the following toy code:

import asyncio

for a in asyncio.tasks._all_tasks:
    print(f"*** CURRENT TASK: {a} ***")
    f = a._coro
    while hasattr(f, "cr_await") and f.cr_await:
        print(f.cr_frame)
        f = f.cr_await

(the actual command was: pyrasite --output localterm 32953 debug.py)

I could see something like this for the stop coroutines:

*** CURRENT TASK: <Task pending name='Task-254' coro=<Client.stop() running at /home/f0ff/tool/client.py:85> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f0cb5ba3820>()]> cb=[gather.<locals>._done_callback() at .../python-3.9.6/lib/python3.9/asyncio/tasks.py:764]> ***
<frame at 0x2041f00, file '/home/f0ff/tool/client.py', line 85, code stop>
<frame at 0x2516530, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_client.py', line 1827, code post>
<frame at 0x25167a0, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_client.py', line 1513, code request>
<frame at 0x1f0bb50, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_client.py', line 1600, code send>
<frame at 0x256e6c0, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_client.py', line 1628, code _send_handling_auth>
<frame at 0x1fb58d0, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_client.py', line 1665, code _send_handling_redirects>
<frame at 0x7f0cb5af4c40, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_client.py', line 1702, code _send_single_request>
<frame at 0x7f0cb5ba19a0, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpx/_transports/default.py', line 291, code handle_async_request>
<frame at 0x253ee00, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpcore/_async/connection_pool.py', line 222, code handle_async_request>
<frame at 0x7f0cb5ab01e0, file '/home/f0ff/.venv/lib/python3.9/site-packages/httpcore/_synchronization.py', line 15, code __aenter__>
<frame at 0x7f0cb5af4440, file '/home/f0ff/.venv/lib/python3.9/site-packages/anyio/_core/_synchronization.py', line 126, code acquire>
<frame at 0x7f0cb5ab0380, file '/home/f0ff/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py', line 1654, code wait>
<frame at 0x7f0cb5ba1d60, file '.../python-3.9.6/lib/python3.9/asyncio/locks.py', line 226, code wait>

There looks to be a deadlock somewhere in the connection pool and its use of httpcore's AsyncLock -> anyio's Lock... unfortunately I cannot repro this easily and I don't know if there is any other detail I could provide to help.

I will leave the hung context open till the weekend, but if I could share any more info please let me know.

@graffic
Copy link

graffic commented Dec 3, 2021

Hello there! Disclaimer: @f0ff886f and I work together.

TL;DR: It seems that the issue is in anyio.Lock. A fix could be to use asyncio.Lock instead.

This code snippet reproduces the deadlock.

import asyncio
import anyio

class Holder:
    def __init__(self):
        # self.lock = asyncio.Lock()
        self.lock = anyio.Lock()

    async def work(self, index):
        print(f"Trying to acquire {index}")
        async with self.lock:
            print(f"Doing index {index}")
            await asyncio.sleep(5)


async def main_anyio():
    holder = Holder()
    t1 = asyncio.create_task(holder.work(1), name="holder 1")
    t2 = asyncio.create_task(holder.work(2), name="holder 2")

    print("Tasks created")
    await asyncio.sleep(0)

    print("Cancelling t1")
    t1.cancel()

    print("Waiting for t2")
    await asyncio.gather(t2)


if __name__ == "__main__":
    asyncio.run(main_anyio())

The explanation is that the first task trying to acquire the anyio.Lock lock is cancelled when the lock is in the __aenter__ of the async context. When a context manager is cancelled in the enter part, the exit part is NOT executed.

So the lock remains acquired and every other task waiting for an unlock.

This does not happen if the lock is an asyncio.Lock(). The asyncio.Lock never yields during the acquire step. Even if its ___aenter__ is async, there is no async code there.

I will fill a bug in the anyio library.

@tomchristie
Copy link
Member

Okay, good work on this one folks, thanks.

We might be able to switch to using either trio.Lock or asyncio.Lock depending on which context we're running within.

@agronholm
Copy link
Contributor

There is a fix: agronholm/anyio#399
The only reason it has not been merged is because I have not been able to get anybody to review or test it.

@f0ff886f
Copy link
Author

Hi @agronholm thank you for the super fast turnaround! I'll poke @graffic to take a look at the PR as he seems to understand the deadlock well.

@tomchristie
Copy link
Member

tomchristie commented Jan 18, 2022

I'll close this off, since I assume it is resolved as of anyio 3.5.0 - https://pypi.org/project/anyio/#history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants