Benchmark work estimate #802

upsj · 2021-06-22T17:33:54Z

This PR adds work estimates to the executor Operations, implements them for a few Dense kernels ~~and outputs them in benchmark loggers.~~

Related to #1784

sonarqubecloud · 2021-06-22T19:32:35Z

SonarCloud Quality Gate failed.

2 Bugs
0 Vulnerabilities
0 Security Hotspots
31 Code Smells

2.3% Coverage
6.5% Duplication

codecov · 2021-06-22T19:36:11Z

Codecov Report

Merging #802 (2e1aa39) into develop (621f991) will increase coverage by 3.68%.
The diff coverage is 21.05%.

❗ Current head 2e1aa39 differs from pull request most recent head dea9c36. Consider uploading reports for the commit dea9c36 to get more accurate results

@@             Coverage Diff             @@
##           develop     #802      +/-   ##
===========================================
+ Coverage    90.50%   94.19%   +3.68%     
===========================================
  Files          505      401     -104     
  Lines        43856    32156   -11700     
===========================================
- Hits         39693    30289    -9404     
+ Misses        4163     1867    -2296

Impacted Files	Coverage Δ
core/matrix/dense_kernels.hpp	`0.00% <0.00%> (ø)`
include/ginkgo/core/base/executor.hpp	`78.64% <0.00%> (+5.96%)`	⬆️
core/matrix/dense.cpp	`99.51% <100.00%> (+6.16%)`	⬆️
core/factorization/ilu.cpp	`0.00% <0.00%> (-100.00%)`	⬇️
reference/factorization/ic_kernels.cpp	`0.00% <0.00%> (-100.00%)`	⬇️
reference/factorization/ilu_kernels.cpp	`0.00% <0.00%> (-100.00%)`	⬇️
include/ginkgo/core/factorization/ilu.hpp	`0.00% <0.00%> (-100.00%)`	⬇️
core/factorization/ic.cpp	`0.00% <0.00%> (-96.88%)`	⬇️
include/ginkgo/core/factorization/ic.hpp	`0.00% <0.00%> (-93.34%)`	⬇️
omp/matrix/fbcsr_kernels.cpp	`0.00% <0.00%> (-57.54%)`	⬇️
... and 404 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

tcojean · 2021-09-14T09:41:11Z

I think one limit with this technic if I understand properly is that it requires two consistencies:

All executors will have a similar implementation so that they have the same work estimates
Every operation is tied to only one kernel.

I'm not sure about 1. for now, I think most of our algorithms have roughly the same order of magnitude of work between executors, but that means a design like the CSR SpMV (classical, imbalance, ...) with strategies would be a no go, and instead we would need an operation for each strategy and switch strategy at the core/algorithm level (which is maybe the best thing to do anyway).

I don't think it's a downside, I thought I should just mention it.

ginkgo-bot · 2022-08-10T09:05:12Z

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed, 253 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

pratikvn · 2025-02-17T12:28:44Z

I think instead of artificially classifying operations into likely compute bound and likely memory bound (which is very much dependent on hardware), IMO a better approach would be to just calculate the work and memory complexities of the operations and register them for each operation. We can then have a roofline estimator (which could take in the hardware properties), to estimate whether the operation is memory-bound or compute bound.

upsj · 2025-02-17T13:38:43Z

I don't think the classification is particularly artificial, let me formulate it as

compute-bound means the compute complexity grows asymptotically faster than the memory complexity
memory-bound means the memory complexity grows asymptotically at least as fast as the compute complexity

There are many kernels where it doesn't make sense to talk of FLOPS, or that don't allow for a nice closed-form expression of their memory footprint/compute complexity, which is why I want to leave the option open to either not annotate kernels at all, or to annotate them with custom metrics.

yhmtsai · 2025-02-20T17:13:00Z

general question: how different between this and the profiler result?
I think they provide the roofline model analysis which might get more detail about the kernel actual action.
We can only calculate in best case (all memory in once) or worst case (no cache), which might not be accurate as what kernels happens in GPUs.

upsj · 2025-02-20T17:30:29Z

Yes, there are more precise models or exact performance counters. This is aiming mostly to provide a rootline-like approximation to the performance, to quickly highlight kernels that are significantly below expected performance, enabling users to highlight possible optimization opportunities. We can use this framework to capture such information on an application level without the need for executing with a profiler, which requires additional tooling for analysis.

The BLAS 1/2 and solver kernels should be pretty accurate, only the SpMVs undercount accesses to the input vector, which should ideally be served from cache though. IIRC, the footprints are equivalent to what we used in our ACM TOMS paper to report achieved bandwidths for different SpMVs and solvers.

upsj added is:idea Just a thought - if it's good, it could evolve into a proposal. 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jun 22, 2021

upsj added this to the Ginkgo 1.5.0 milestone Jun 22, 2021

upsj self-assigned this Jun 22, 2021

ginkgo-bot added mod:core This is related to the core module. reg:benchmarking This is related to benchmarking. type:matrix-format This is related to the Matrix formats labels Jun 22, 2021

upsj mentioned this pull request Sep 19, 2021

Add Sparse BLAS benchmark #759

Merged

upsj added the 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. label Oct 6, 2021

upsj mentioned this pull request Apr 22, 2022

Add solver workspace structure #1028

Merged

2 tasks

upsj removed this from the Ginkgo 1.5.0 milestone May 9, 2022

upsj force-pushed the benchmark_work_estimate branch 2 times, most recently from e4b0524 to dea9c36 Compare August 10, 2022 08:53

MarcelKoch added the is:help-wanted Need ideas on how to solve this. label Jul 11, 2024

MarcelKoch marked this pull request as draft July 11, 2024 13:24

MarcelKoch added the is:good-first-issue Good for newcomers. label Feb 13, 2025

upsj force-pushed the benchmark_work_estimate branch from dea9c36 to 83a73da Compare February 16, 2025 19:29

upsj marked this pull request as ready for review February 16, 2025 19:29

upsj mentioned this pull request Feb 16, 2025

Generate ginkgo.hpp via CMake #1782

Merged

upsj force-pushed the benchmark_work_estimate branch from 83a73da to 97aa307 Compare February 16, 2025 20:23

upsj removed is:help-wanted Need ideas on how to solve this. is:good-first-issue Good for newcomers. 1:ST:WIP This PR is a work in progress. Not ready for review. 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Feb 16, 2025

upsj requested a review from a team February 16, 2025 20:24

upsj changed the base branch from develop to fix_ginkgo_hpp February 16, 2025 20:24

upsj mentioned this pull request Feb 16, 2025

Add logger for benchmark work estimate output #1784

Open

2 tasks

upsj force-pushed the fix_ginkgo_hpp branch from 31243ba to e6c8c22 Compare February 18, 2025 08:31

Base automatically changed from fix_ginkgo_hpp to develop February 18, 2025 08:31

upsj added the 1:ST:ready-for-review This PR is ready for review label Feb 20, 2025

add work estimate framework

c984d91

upsj force-pushed the benchmark_work_estimate branch from 97aa307 to c984d91 Compare February 20, 2025 11:24

MarcelKoch self-requested a review February 20, 2025 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark work estimate #802

Benchmark work estimate #802

upsj commented Jun 22, 2021 •

edited

Loading

sonarqubecloud bot commented Jun 22, 2021

codecov bot commented Jun 22, 2021 •

edited

Loading

tcojean commented Sep 14, 2021

ginkgo-bot commented Aug 10, 2022

pratikvn commented Feb 17, 2025

upsj commented Feb 17, 2025

yhmtsai commented Feb 20, 2025

upsj commented Feb 20, 2025

Benchmark work estimate #802

Are you sure you want to change the base?

Benchmark work estimate #802

Conversation

upsj commented Jun 22, 2021 • edited Loading

sonarqubecloud bot commented Jun 22, 2021

codecov bot commented Jun 22, 2021 • edited Loading

Codecov Report

tcojean commented Sep 14, 2021

ginkgo-bot commented Aug 10, 2022

pratikvn commented Feb 17, 2025

upsj commented Feb 17, 2025

yhmtsai commented Feb 20, 2025

upsj commented Feb 20, 2025

upsj commented Jun 22, 2021 •

edited

Loading

codecov bot commented Jun 22, 2021 •

edited

Loading