[Bug] 使用profile_restful_api.py测试api_server时出现若干bug #3231

cccccya · 2025-03-08T14:22:19Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

在测试超高rps的场景下lmdeploy性能时出现bug。benchmark选用lmdeploy/benchmark/profile_restful_api.py。

profile_restful_api.py中sample_random_requests()时出现警告。

profile_restful_api.py:498: RuntimeWarning: divide by zero encountered in scalar floor_divide

原因在于ShareGPT数据集中有prompt_len==0的情况，这可能是由于sharegpt采样时本身的数据就是空串。建议在采样时过滤掉空串的prompt。
修改：在代码中添加一个临时简单的绕过继续测试。（添加后不产生Warning）

block_trie中可能出现leave_blocks为空的情况。
Server报错

修改：在lmdeploy/lmdeploy/pytorch/paging/block_trie.py中添加判断（184、185行）。调用get_ref_count前检查leave_blocks是否为非空。

engin调度失败，触发assert len(running) > 0
修改上述两项后，重新编译安装了lmdeploy。运行脚本。设置--random-input-len 8192或81920时能够正常运行。但是设置--random-input-len 9216时，出现ERROR，schedule()调度失败。

对此问题不知道如何解决。麻烦各位大佬帮忙看看问题可能出在哪儿，十分感谢Orz

Reproduction

server.sh

#!/bin/bash
# 启动服务端
lmdeploy serve api_server /data/llm/Qwen2.5-7B-Instruct \
--backend pytorch \
--device cuda \
--eager-mode \
--cache-max-entry-count=0.9 \
--enable-prefix-caching \
--dtype float16

client.sh

#!/bin/bash
python profile_restful_api.py \
--dataset-path /data/ds/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json \
--backend lmdeploy \
--model /data/llm/Qwen2.5-7B-Instruct \
--dataset-name random \
--random-input-len 9216 \
--random-output-len 150 \
--num-prompts 10000 \
--random-range-ratio 1

Environment

sys.platform: linux
Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA A100-SXM4-40GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.6, V12.6.68
GCC: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
PyTorch: 2.5.1+cu124
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.4
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.20.1+cu124
LMDeploy: 0.7.1+
transformers: 4.49.0
gradio: Not Found
fastapi: 0.115.11
pydantic: 2.10.6
triton: 3.1.0
NVIDIA Topology:
        GPU0    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      SYS     SYS     0-17,72-89      0               N/A
NIC0    SYS      X      PIX
NIC1    SYS     PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1

Error traceback

2025-03-08 13:01:23,717 - lmdeploy - ERROR - engine.py:765 - Task <EngineMainLoop> failed
Traceback (most recent call last):
  File "/home/cya/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 760, in __task_callback
    task.result()
  File "/home/cya/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 811, in async_loop
    await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event)
  File "/home/cya/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 745, in _async_loop_main
    await _prefetch_next_inputs()
  File "/home/cya/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 733, in _prefetch_next_inputs
    await _send_next_inputs(prefill)
  File "/home/cya/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 714, in _send_next_inputs
    forward_inputs = self._make_forward_inputs(prefill)
  File "/home/cya/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 620, in _make_forward_inputs
    assert len(running) > 0
AssertionError

The text was updated successfully, but these errors were encountered:

grimoire · 2025-03-09T09:36:32Z

非常感谢汇报 bug！
关于问题3，可能是最近添加的 pre-schedule 功能导致部分请求被 lock，scheduler 找不到 decoding 请求导致的。
#3221 这个 PR 应该修复了这个 bug，可以试试看。
其他 bug 我们会尽快修复，当然如果您有意向给我们提 PR 的话我们也非常欢迎！

lvhan028 assigned grimoire Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 使用profile_restful_api.py测试api_server时出现若干bug #3231

[Bug] 使用profile_restful_api.py测试api_server时出现若干bug #3231

cccccya commented Mar 8, 2025

grimoire commented Mar 9, 2025

[Bug] 使用profile_restful_api.py测试api_server时出现若干bug #3231

[Bug] 使用profile_restful_api.py测试api_server时出现若干bug #3231

Comments

cccccya commented Mar 8, 2025

Checklist

Describe the bug

Reproduction

Environment

Error traceback

grimoire commented Mar 9, 2025