feat: add generate support to sagemaker_server #8047

ziqif-nv · 2025-02-28T22:10:40Z

What does the PR do?

Sagemaker side wants the equivalent of HandleGenerate for sagemaker_server.cc, which only has HandleInfer now.

However, sagemaker_server only has a single /invocations endpoint and Triton already used that to map to HandleInfer

To resolve above issue, after discussing with Sagemaker side, there are two options:

add an environment variable to enable Triton during launch to map /invocations to either HandleInfer, HandleGenerate, HandleGenerate(with streaming) based on their needs
add to request header on using generate mode or not for each request

In this PR, we are implementing with option 1 since Sagemaker side is ok with both options and option 1 seems easier for both Sagemaker and Triton side to make the change and adopt.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

sagemaker_server.h
sagemaker_server.cc
server
*_test.py

Test plan:

L0_sagemaker with new unit test to cover both serve script and new inference types - generate and generate_stream

CI Pipeline ID:
24845792

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Resovles GitHub issue: [RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

rmccorm4 · 2025-03-03T22:31:09Z

qa/python_models/generate_models/mock_llm/1/model.py

If this is the same model taken from L0_http generate tests - can you make it so there's only one copy of the model and both tests use it?

yes, it is. Let me refactor L0_http to use the new location then

rmccorm4 · 2025-03-03T22:36:06Z

src/sagemaker_server.h

@@ -159,6 +160,11 @@ class SagemakerAPIServer : public HTTPAPIServer {

  static const std::string binary_mime_type_;

+  // Type of inference:infer, generate or generate_stream.


nit: minor re-word - may need line wrapping

Suggested change

// Type of inference:infer, generate or generate_stream.

// Triton HTTP handler to map Sagemaker /invocations route to: "infer", "generate", or "generate_stream"

updated. thanks

src/sagemaker_server.cc

qa/python_models/generate_models/mock_llm/config.pbtxt

qa/L0_sagemaker/sagemaker_generate_stream_test.py

qa/python_models/generate_models/mock_llm/1/model.py

qa/L0_sagemaker/test.sh

qa/L0_sagemaker/sagemaker_generate_stream_test.py

qa/L0_sagemaker/sagemaker_generate_test.py

krishung5

LGTM, nice work!

rmccorm4

Nice work 🚀

@ziqif-nv I just noticed we don't have any documentation on sagemaker support. While there's quite a lot that could be documented, I think we could get away with something lightweight but discoverable by SEO.

Could you do a quick follow-up PR that adds a server/docs/cutomization_guide/sagemaker.md?

I think it can basically just provide links to 3 things in one place so they're easier to find by users, and doesn't need to go into too much depth otherwise:

See docker/sagemaker/serve tool for details on how it is deployed
See qa/L0_sakemaker for example usage and testing
See https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html for more details

and we can reach out to sagemaker team to update the env var table on https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html with this newly added SAGEMAKER_TRITON_INFERENCE_TYPE

ziqif-nv · 2025-03-04T17:30:17Z

Nice work 🚀

@ziqif-nv I just noticed we don't have any documentation on sagemaker support. While there's quite a lot that could be documented, I think we could get away with something lightweight but discoverable by SEO.

Could you do a quick follow-up PR that adds a server/docs/cutomization_guide/sagemaker.md?

I think it can basically just provide links to 3 things in one place so they're easier to find by users, and doesn't need to go into too much depth otherwise:

See docker/sagemaker/serve tool for details on how it is deployed

See qa/L0_sakemaker for example usage and testing

See https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html for more details

and we can reach out to sagemaker team to update the env var table on https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html with this newly added SAGEMAKER_TRITON_INFERENCE_TYPE

for sure. will do it in a following PR

ziqif-nv added the PR: feat A new feature label Feb 28, 2025

feat: add generate support to sagemaker_server

949c3cd

ziqif-nv force-pushed the ziqif-sagemaker-handlegenerate branch from a414772 to 949c3cd Compare February 28, 2025 22:19

fix c_str

7ae979c

ziqif-nv marked this pull request as ready for review March 3, 2025 22:25

ziqif-nv requested review from rmccorm4 and krishung5 March 3, 2025 22:26

rmccorm4 reviewed Mar 3, 2025

View reviewed changes

src/sagemaker_server.cc Outdated Show resolved Hide resolved

oandreeva-nv reviewed Mar 3, 2025

View reviewed changes

qa/python_models/generate_models/mock_llm/config.pbtxt Outdated Show resolved Hide resolved

krishung5 reviewed Mar 3, 2025

View reviewed changes

qa/L0_sagemaker/sagemaker_generate_stream_test.py Show resolved Hide resolved

oandreeva-nv reviewed Mar 3, 2025

View reviewed changes

qa/python_models/generate_models/mock_llm/1/model.py Outdated Show resolved Hide resolved

oandreeva-nv reviewed Mar 3, 2025

View reviewed changes

qa/L0_sagemaker/test.sh Show resolved Hide resolved

address comments

ed41219

krishung5 reviewed Mar 4, 2025

View reviewed changes

qa/L0_sagemaker/sagemaker_generate_stream_test.py Outdated Show resolved Hide resolved

krishung5 reviewed Mar 4, 2025

View reviewed changes

qa/L0_sagemaker/sagemaker_generate_test.py Outdated Show resolved Hide resolved

address comments

42024d3

krishung5 approved these changes Mar 4, 2025

View reviewed changes

rmccorm4 approved these changes Mar 4, 2025

View reviewed changes

ziqif-nv merged commit 96e7cb5 into main Mar 4, 2025
3 checks passed

ziqif-nv deleted the ziqif-sagemaker-handlegenerate branch March 4, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add generate support to sagemaker_server #8047

feat: add generate support to sagemaker_server #8047

ziqif-nv commented Feb 28, 2025 •

edited

Loading

rmccorm4 Mar 3, 2025

ziqif-nv Mar 3, 2025

rmccorm4 Mar 3, 2025

ziqif-nv Mar 4, 2025

krishung5 left a comment

rmccorm4 left a comment •

edited

Loading

ziqif-nv commented Mar 4, 2025

		@@ -159,6 +160,11 @@ class SagemakerAPIServer : public HTTPAPIServer {

		static const std::string binary_mime_type_;

		// Type of inference:infer, generate or generate_stream.

	// Type of inference:infer, generate or generate_stream.
	// Triton HTTP handler to map Sagemaker /invocations route to: "infer", "generate", or "generate_stream"

feat: add generate support to sagemaker_server #8047

feat: add generate support to sagemaker_server #8047

Conversation

ziqif-nv commented Feb 28, 2025 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

rmccorm4 Mar 3, 2025

Choose a reason for hiding this comment

ziqif-nv Mar 3, 2025

Choose a reason for hiding this comment

rmccorm4 Mar 3, 2025

Choose a reason for hiding this comment

ziqif-nv Mar 4, 2025

Choose a reason for hiding this comment

krishung5 left a comment

Choose a reason for hiding this comment

rmccorm4 left a comment • edited Loading

Choose a reason for hiding this comment

ziqif-nv commented Mar 4, 2025

ziqif-nv commented Feb 28, 2025 •

edited

Loading

rmccorm4 left a comment •

edited

Loading