Skip to content

[Serve] Add endpoint caching for status --endpoint #5052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

kyuds
Copy link
Collaborator

@kyuds kyuds commented Mar 28, 2025

Fixes #2915

This is a caching feature to cache the output for sky serve status --endpoint. When sky serve up is invoked or sky serve status is invoked, then the returned serve endpoint will be cached to the db table endpoint_cache under ~/.sky/serve/service.db.

This endpoint is cached on two conditions:

  1. Caching the return value of sky serve up.
  2. Caching the return of sky serve status IFF the ServiceStatus enum is Ready.

This endpoint is removed from the cache on two conditions:

  1. The return ServiceStatus of sky serve status is not Ready.
  2. sky serve down is invoked.

Also there is a new flag called --force-refresh that only takes into effect when --endpoint is also used, which effectively is used to bypass the cache and query the serve controller.

One thing I wanted to point out though is I had to make a helper function called _make_dummy in line 649 of serve/server/core.py, which creates dummy data to simulate the return schema of a status request. Without this the code seems to go into an infinitely spinning loop. I couldn't easily find where the schema is enforced, and wasn't sure whether it is appropriate at all to remove the schema enforcement anyways. If the dummy data present in _make_dummy doesn't seem right, I will be more than happy to modify it!

<_make_dummy function in question>

** fixed. please see resolved comment

def _make_dummy(cached_record: Dict[str, str]) -> Dict[str, Any]:
    """Dummy data for service status.

    sky serve status --endpoint deadlocks if we do not return
    the complete schema.
    """
    service_name = cached_record['name']
    endpoint = cached_record['endpoint']
    return {
        'name': service_name,
        'active_versions': [],
        'controller_job_id': -1,
        'uptime': -1,
        'status': serve_state.ServiceStatus.READY,
        'controller_port': None,
        'load_balancer_port': None,
        'endpoint': endpoint,
        'policy': None,
        'requested_resources_str': '',
        'load_balancing_policy': '',
        'tls_encrypted': False,
        'replica_info': []
    }

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
sky serve status --endpoint
sky serve status --endpoint --force-refresh
sky serve status <service_name> --endpoint
sky serve status <service_name> --endpoint --force-refresh
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@cg505 cg505 requested a review from cblmemo March 28, 2025 02:09
@kyuds
Copy link
Collaborator Author

kyuds commented Apr 7, 2025

@cblmemo just wondering if you had the opportunity to take a look?

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @kyuds ! This is exciting. Left some discussions.

Also, can we include some benchmark on the performance of before & after of this PR?

@kyuds kyuds requested a review from cblmemo April 21, 2025 22:47
@kyuds
Copy link
Collaborator Author

kyuds commented Apr 25, 2025

Ran benchmarks @cblmemo
serve_endpoint_cache_benchmarks
Pretty good I would say! Some comments:

  1. The actual fetching from the sqlite3 table takes negligible time. We get a 1 second latency because the table for the cache is stored on the sky api server, not the local client as we've discussed in the issue. Currently, the execution loop queries the api server through an http request, which explains why we get that kind of latency. Still, much much better than 7 seconds.
  2. The first query will take longer (0.5 seconds more) I think this is due to the db being loaded. We are implementing the cache because we expect users to query very frequently, so I don't think this will be a significant issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SkyServe] Cache output of sky serve status --endpoint
2 participants