[FEA] Add pipelining to the NDS-H-cpp
benchmarks
#18206
Labels
feature request
New feature or request
good first issue
Good for newcomers
libcudf
Affects libcudf (C++/CUDA) code.
Is your feature request related to a problem? Please describe.
In the libcudf microbenchmarks, the
NDS-H-cpp
benchmarks are a useful tool for studying GPU query performance.They could also be used to study pipelining. An application can "pipeline" work on the GPU using 2 or more host threads to sequence calls to the libcudf public API. Pipelining is useful in IO-heavy workloads where one thread can be copying data to the GPU while another thread is running kernels over previously-copied data. Pipelining is needed to ensure that GPU compute is not left idle during copying steps.
Describe the solution you'd like
Claude and I wrote a simple concurrent benchmark for query 5 using PTDS. We could take this idea and update to use a CUDA stream pool. We would also want to consider how pipelining could be applied to other queries without modifying each query file.
The profiles show that query 5 is IO-bound and yet still has some bubbles where compute is running, but not IO. We should investigate why IO is blocking kernel work in some cases.

The text was updated successfully, but these errors were encountered: