Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add generate support to sagemaker_server #8047

Merged
merged 4 commits into from
Mar 4, 2025

Conversation

ziqif-nv
Copy link
Contributor

@ziqif-nv ziqif-nv commented Feb 28, 2025

What does the PR do?

Sagemaker side wants the equivalent of HandleGenerate for sagemaker_server.cc, which only has HandleInfer now.

However, sagemaker_server only has a single /invocations endpoint and Triton already used that to map to HandleInfer

To resolve above issue, after discussing with Sagemaker side, there are two options:

  1. add an environment variable to enable Triton during launch to map /invocations to either HandleInfer, HandleGenerate, HandleGenerate(with streaming) based on their needs
  2. add to request header on using generate mode or not for each request

In this PR, we are implementing with option 1 since Sagemaker side is ok with both options and option 1 seems easier for both Sagemaker and Triton side to make the change and adopt.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

  1. sagemaker_server.h
  2. sagemaker_server.cc
  3. server
  4. *_test.py

Test plan:

L0_sagemaker with new unit test to cover both serve script and new inference types - generate and generate_stream

  • CI Pipeline ID:
    24845792

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

@ziqif-nv ziqif-nv added the PR: feat A new feature label Feb 28, 2025
@ziqif-nv ziqif-nv force-pushed the ziqif-sagemaker-handlegenerate branch from a414772 to 949c3cd Compare February 28, 2025 22:19
@ziqif-nv ziqif-nv marked this pull request as ready for review March 3, 2025 22:25
@ziqif-nv ziqif-nv requested review from rmccorm4 and krishung5 March 3, 2025 22:26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the same model taken from L0_http generate tests - can you make it so there's only one copy of the model and both tests use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is. Let me refactor L0_http to use the new location then

@@ -159,6 +160,11 @@ class SagemakerAPIServer : public HTTPAPIServer {

static const std::string binary_mime_type_;

// Type of inference:infer, generate or generate_stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: minor re-word - may need line wrapping

Suggested change
// Type of inference:infer, generate or generate_stream.
// Triton HTTP handler to map Sagemaker /invocations route to: "infer", "generate", or "generate_stream"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. thanks

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 🚀

@ziqif-nv I just noticed we don't have any documentation on sagemaker support. While there's quite a lot that could be documented, I think we could get away with something lightweight but discoverable by SEO.

Could you do a quick follow-up PR that adds a server/docs/cutomization_guide/sagemaker.md?

I think it can basically just provide links to 3 things in one place so they're easier to find by users, and doesn't need to go into too much depth otherwise:

  1. See docker/sagemaker/serve tool for details on how it is deployed
  2. See qa/L0_sakemaker for example usage and testing
  3. See https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html for more details

and we can reach out to sagemaker team to update the env var table on https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html with this newly added SAGEMAKER_TRITON_INFERENCE_TYPE

@ziqif-nv
Copy link
Contributor Author

ziqif-nv commented Mar 4, 2025

Nice work 🚀

@ziqif-nv I just noticed we don't have any documentation on sagemaker support. While there's quite a lot that could be documented, I think we could get away with something lightweight but discoverable by SEO.

Could you do a quick follow-up PR that adds a server/docs/cutomization_guide/sagemaker.md?

I think it can basically just provide links to 3 things in one place so they're easier to find by users, and doesn't need to go into too much depth otherwise:

  1. See docker/sagemaker/serve tool for details on how it is deployed
  2. See qa/L0_sakemaker for example usage and testing
  3. See https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html for more details

and we can reach out to sagemaker team to update the env var table on https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html with this newly added SAGEMAKER_TRITON_INFERENCE_TYPE

for sure. will do it in a following PR

@ziqif-nv ziqif-nv merged commit 96e7cb5 into main Mar 4, 2025
3 checks passed
@ziqif-nv ziqif-nv deleted the ziqif-sagemaker-handlegenerate branch March 4, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

4 participants