Skip to content

[API server] handle logs request in coroutine #5366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

aylei
Copy link
Collaborator

@aylei aylei commented Apr 25, 2025

close #4767

This PR includes the minimal changes that move /logs handling to coroutine:

  1. introduce a coroutine context, which handles cancellation, log redirection and env var overrides;
  2. run /logs in uvicorn's event loop;

Though the task is now executed directly in the unvicorn process, we still maintain a request record for logs request to keep the behavior consistent: user can still cancel a log request sky api cancel and retrieve the log again with sky api logs.

Follow ups:

Benchmark

  1. Command: python tests/load_tests/test_load_on_server.py -n 100 --apis tail_logs -c kubernetes under low server concurrency, 1c2g machine (1 long workers + 2 short workers):
# This PR
All requests completed in 16.20 seconds

----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      1642.00    16.42      16.20      18.26      17.25      18.26

# Master
All requests completed in 229.30 seconds

Latency Statistics:
----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      11675.38   116.75     3.31       229.30     218.19     227.03

There is a 7x improvement in average. The bottleneck of this PR is that each log task runs in a dedicated thread and there is only 1 uvicorn worker process, GIL contention makes the 100 logs threads cannot be fully concurrent.

  1. Command: python tests/load_tests/test_load_on_server.py -n 100 --apis tail_logs -c aws under unlimited concurrency local mode (burstable worker), 4c16g machine:
# This PR
All requests completed in 56.22 seconds

Latency Statistics:
----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      5367.31    53.67      53.45      56.06      53.76      56.04

# Master
All requests completed in 90.30 seconds

Latency Statistics:
----------------------------------------------------------------------------------------------------
Kind                 Count    Total(s)   Avg(s)     Min(s)     Max(s)     P95(s)     P99(s)
----------------------------------------------------------------------------------------------------
API /tail_logs       100      7838.55    78.39      43.78      90.20      90.00      90.10

Resources:

# This PR
PEAK USAGE:
Peak CPU: 100.0%
Peak Memory: 1.50GB (11.8%)
Memory Delta: 0.6GB
Peak Short Executor Memory: 0.16GB
Peak Short Executor Memory Average: 0.16GB
Peak Long Executor Memory: 0.00GB
Peak Long Executor Memory Average: 0.00GB

# Master
PEAK USAGE:
Peak CPU: 100.0%
Peak Memory: 7.92GB (53.7%)
Memory Delta: 6.7GB
Peak Short Executor Memory: 0.20GB
Peak Short Executor Memory Average: 0.18GB
Peak Long Executor Memory: 0.19GB
Peak Long Executor Memory Average: 0.19GB

About 10x memory efficiency. However, the test found that logs on aws instance is significantly slower than logs on kubernetes instance (I switch the benchmark env to AWS EC2 for accurate resource usage accounting). This might be related to more RPCs/CPU cycles touched by the AWS code path, I leave this as a followup as it is not actually relevant to this PR.

Tests

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@aylei aylei changed the title [API server] handle logs request in event loop [API server] handle logs request in coroutine Apr 27, 2025
@aylei aylei marked this pull request as ready for review April 27, 2025 12:50
@aylei
Copy link
Collaborator Author

aylei commented Apr 27, 2025

/smoke-test -k test_minimal

@aylei aylei requested a review from Michaelvll April 27, 2025 13:06
@aylei
Copy link
Collaborator Author

aylei commented Apr 28, 2025

/smoke-test

@Michaelvll Michaelvll requested a review from cg505 April 28, 2025 18:57
Copy link
Collaborator

@cg505 cg505 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, awesome work @aylei! Left a handful of comments - mostly questions/asking for some clarification comments

break


def cancellation_guard(func):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure this still works with the type checker? That is, it doesn't erase the types of functions it is applied to. We have this issue with some decorators.

Copy link
Collaborator Author

@aylei aylei Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will take a look.


background_tasks.add_task(cancel_task)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty confusing - wouldn't this immediately cancel the task? Is this deferred for some reason? Can we leave a comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the BackgroundTasks doc, I see that "background tasks" don't actually start until after the request is finished. (Contrary to how it sounds, that they would immediately begin in the background.) Still, could we clarify in a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good catch!

Comment on lines 34 to 38
ctx = context.get()
if ctx is not None:
v = ctx.getenv(self.env_var, str(self.default))
else:
v = os.getenv(self.env_var, str(self.default))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can make this entire block into a common helper function.
also, can we check other uses of getenv? I'm a bit concerned especially about the config env vars

Comment on lines +48 to +54
futs.append(
pool.apply_async(pipe,
(proc.stdout, ctx.output_stream(sys.stdout))))
if proc.stderr is not None:
futs.append(
pool.apply_async(pipe,
(proc.stderr, ctx.output_stream(sys.stderr))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we update process_subprocess_stream? Seems like both that code and this code are reading from proc.stdout / proc.stderr, which I'd expect to cause issues. No?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It works coincidentally because the code path of sky logs always set process_stream=False so process_subprocess_stream and pipe_and_wait_process will never be called in the same time in current code base. I did not update process_subprocess_stream because it can only runs in main thread now

# Do not launch a thread for stdout as the rich.status does not

I think we can assert process_stream=False when ctx is not None for now and refine process_subprocess_stream when we broad the usage of async context

@@ -390,6 +393,114 @@ def _request_execution_wrapper(request_id: str,
logger.info(f'Request {request_id} finished')


async def execute_request(request: api_requests.Request):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename to clarify that this uses a coroutine rather than the typical request submission?

Suggested change
async def execute_request(request: api_requests.Request):
async def execute_request_coroutine(request: api_requests.Request):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async keyword implies the same semantic here as "coroutine" and there would be a lint failure if execute_request is not called with await or asyncio primitives. For request that is executed in process executor, we use schedule_request. So maybe it is okay to keep the current one for brevity, wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would like some distinction. If not in the name, in a comment, but I think the name is better.

My concern is not that this function must be run in a coroutine. As you say, that's clear from the async keyword. It's more about clarifying the type of request (a "coroutine" request vs a normal executor-based request).

The main concern: If I am new to skypilot and look at this function, I might expect that all requests are going to be executed using this function. In fact, only coroutine requests use this. For other requests, there is a totally different code path that the executor uses to execute the request. I may not realize that the executor does not run requests using coroutines.

request.log_path.touch()
return request


def schedule_request(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: refactor schedule_request to take a Request, for consistency with the prepare_request / execute_request flow. That way we can remove all the args and move them/the docstring to prepare_request

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I also considered the same approach and postponed this to make this PR more focused, follow up: #5434

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Server] optimize the cost of sky logs on server
2 participants