-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add generate support to sagemaker_server #8047
Conversation
a414772
to
949c3cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the same model taken from L0_http generate tests - can you make it so there's only one copy of the model and both tests use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is. Let me refactor L0_http to use the new location then
src/sagemaker_server.h
Outdated
@@ -159,6 +160,11 @@ class SagemakerAPIServer : public HTTPAPIServer { | |||
|
|||
static const std::string binary_mime_type_; | |||
|
|||
// Type of inference:infer, generate or generate_stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: minor re-word - may need line wrapping
// Type of inference:infer, generate or generate_stream. | |
// Triton HTTP handler to map Sagemaker /invocations route to: "infer", "generate", or "generate_stream" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 🚀
@ziqif-nv I just noticed we don't have any documentation on sagemaker support. While there's quite a lot that could be documented, I think we could get away with something lightweight but discoverable by SEO.
Could you do a quick follow-up PR that adds a server/docs/cutomization_guide/sagemaker.md
?
I think it can basically just provide links to 3 things in one place so they're easier to find by users, and doesn't need to go into too much depth otherwise:
- See
docker/sagemaker/serve
tool for details on how it is deployed - See
qa/L0_sakemaker
for example usage and testing - See https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html for more details
and we can reach out to sagemaker team to update the env var table on https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-triton.html with this newly added SAGEMAKER_TRITON_INFERENCE_TYPE
for sure. will do it in a following PR |
What does the PR do?
Sagemaker side wants the equivalent of HandleGenerate for sagemaker_server.cc, which only has HandleInfer now.
However, sagemaker_server only has a single /invocations endpoint and Triton already used that to map to HandleInfer
To resolve above issue, after discussing with Sagemaker side, there are two options:
In this PR, we are implementing with option 1 since Sagemaker side is ok with both options and option 1 seems easier for both Sagemaker and Triton side to make the change and adopt.
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
L0_sagemaker with new unit test to cover both serve script and new inference types - generate and generate_stream
24845792
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)