Skip to content

Commit 8f54332

Browse files
authored
adds video of chat interface for trtllm_latency (#1123)
1 parent fb1cfcd commit 8f54332

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

Diff for: 06_gpu_and_ml/llm-serving/trtllm_latency.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,14 @@
1515
# With the out-of-the-box defaults we observe an unacceptable median time
1616
# to last token of over a second, but with careful configuration,
1717
# we'll bring that down to under 250ms -- over a 4x speed up!
18+
# These latencies were measured on a single NVIDIA H100 GPU
19+
# running LLaMA 3 8B on prompts and generations of a few dozen to a few hundred tokens.
1820

19-
# These latencies were measured on a single NVIDIA H100 GPU with prompts and generations
20-
# of a few dozen to a few hundred tokens.
21+
# Here's what that looks like in a terminal chat interface:
22+
23+
# <video controls autoplay loop muted>
24+
# <source src="https://modal-cdn.com/example-trtllm-latency.mp4" type="video/mp4">
25+
# </video>
2126

2227
# ## Overview
2328

0 commit comments

Comments
 (0)