Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark work estimate #802

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Benchmark work estimate #802

wants to merge 1 commit into from

Conversation

upsj
Copy link
Member

@upsj upsj commented Jun 22, 2021

This PR adds work estimates to the executor Operations, implements them for a few Dense kernels and outputs them in benchmark loggers.

Related to #1784

@upsj upsj added is:idea Just a thought - if it's good, it could evolve into a proposal. 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jun 22, 2021
@upsj upsj added this to the Ginkgo 1.5.0 milestone Jun 22, 2021
@upsj upsj self-assigned this Jun 22, 2021
@ginkgo-bot ginkgo-bot added mod:core This is related to the core module. reg:benchmarking This is related to benchmarking. type:matrix-format This is related to the Matrix formats labels Jun 22, 2021
@sonarqubecloud
Copy link

@codecov
Copy link

codecov bot commented Jun 22, 2021

Codecov Report

Merging #802 (2e1aa39) into develop (621f991) will increase coverage by 3.68%.
The diff coverage is 21.05%.

❗ Current head 2e1aa39 differs from pull request most recent head dea9c36. Consider uploading reports for the commit dea9c36 to get more accurate results

@@             Coverage Diff             @@
##           develop     #802      +/-   ##
===========================================
+ Coverage    90.50%   94.19%   +3.68%     
===========================================
  Files          505      401     -104     
  Lines        43856    32156   -11700     
===========================================
- Hits         39693    30289    -9404     
+ Misses        4163     1867    -2296     
Impacted Files Coverage Δ
core/matrix/dense_kernels.hpp 0.00% <0.00%> (ø)
include/ginkgo/core/base/executor.hpp 78.64% <0.00%> (+5.96%) ⬆️
core/matrix/dense.cpp 99.51% <100.00%> (+6.16%) ⬆️
core/factorization/ilu.cpp 0.00% <0.00%> (-100.00%) ⬇️
reference/factorization/ic_kernels.cpp 0.00% <0.00%> (-100.00%) ⬇️
reference/factorization/ilu_kernels.cpp 0.00% <0.00%> (-100.00%) ⬇️
include/ginkgo/core/factorization/ilu.hpp 0.00% <0.00%> (-100.00%) ⬇️
core/factorization/ic.cpp 0.00% <0.00%> (-96.88%) ⬇️
include/ginkgo/core/factorization/ic.hpp 0.00% <0.00%> (-93.34%) ⬇️
omp/matrix/fbcsr_kernels.cpp 0.00% <0.00%> (-57.54%) ⬇️
... and 404 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@tcojean
Copy link
Member

tcojean commented Sep 14, 2021

I think one limit with this technic if I understand properly is that it requires two consistencies:

  1. All executors will have a similar implementation so that they have the same work estimates
  2. Every operation is tied to only one kernel.

I'm not sure about 1. for now, I think most of our algorithms have roughly the same order of magnitude of work between executors, but that means a design like the CSR SpMV (classical, imbalance, ...) with strategies would be a no go, and instead we would need an operation for each strategy and switch strategy at the core/algorithm level (which is maybe the best thing to do anyway).

I don't think it's a downside, I thought I should just mention it.

@upsj upsj mentioned this pull request Sep 19, 2021
@upsj upsj added the 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. label Oct 6, 2021
@upsj upsj mentioned this pull request Apr 22, 2022
2 tasks
@upsj upsj removed this from the Ginkgo 1.5.0 milestone May 9, 2022
@upsj upsj force-pushed the benchmark_work_estimate branch 2 times, most recently from e4b0524 to dea9c36 Compare August 10, 2022 08:53
@ginkgo-bot
Copy link
Member

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed, 253 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

@MarcelKoch MarcelKoch added the is:help-wanted Need ideas on how to solve this. label Jul 11, 2024
@MarcelKoch MarcelKoch marked this pull request as draft July 11, 2024 13:24
@MarcelKoch MarcelKoch added the is:good-first-issue Good for newcomers. label Feb 13, 2025
@upsj upsj force-pushed the benchmark_work_estimate branch from dea9c36 to 83a73da Compare February 16, 2025 19:29
@upsj upsj marked this pull request as ready for review February 16, 2025 19:29
@upsj upsj force-pushed the benchmark_work_estimate branch from 83a73da to 97aa307 Compare February 16, 2025 20:23
@upsj upsj removed is:help-wanted Need ideas on how to solve this. is:good-first-issue Good for newcomers. 1:ST:WIP This PR is a work in progress. Not ready for review. 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Feb 16, 2025
@upsj upsj requested a review from a team February 16, 2025 20:24
@upsj upsj changed the base branch from develop to fix_ginkgo_hpp February 16, 2025 20:24
@pratikvn
Copy link
Member

I think instead of artificially classifying operations into likely compute bound and likely memory bound (which is very much dependent on hardware), IMO a better approach would be to just calculate the work and memory complexities of the operations and register them for each operation. We can then have a roofline estimator (which could take in the hardware properties), to estimate whether the operation is memory-bound or compute bound.

@upsj
Copy link
Member Author

upsj commented Feb 17, 2025

I don't think the classification is particularly artificial, let me formulate it as

  • compute-bound means the compute complexity grows asymptotically faster than the memory complexity
  • memory-bound means the memory complexity grows asymptotically at least as fast as the compute complexity

There are many kernels where it doesn't make sense to talk of FLOPS, or that don't allow for a nice closed-form expression of their memory footprint/compute complexity, which is why I want to leave the option open to either not annotate kernels at all, or to annotate them with custom metrics.

Base automatically changed from fix_ginkgo_hpp to develop February 18, 2025 08:31
@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Feb 20, 2025
@upsj upsj force-pushed the benchmark_work_estimate branch from 97aa307 to c984d91 Compare February 20, 2025 11:24
@MarcelKoch MarcelKoch self-requested a review February 20, 2025 11:39
@yhmtsai
Copy link
Member

yhmtsai commented Feb 20, 2025

general question: how different between this and the profiler result?
I think they provide the roofline model analysis which might get more detail about the kernel actual action.
We can only calculate in best case (all memory in once) or worst case (no cache), which might not be accurate as what kernels happens in GPUs.

@upsj
Copy link
Member Author

upsj commented Feb 20, 2025

Yes, there are more precise models or exact performance counters. This is aiming mostly to provide a rootline-like approximation to the performance, to quickly highlight kernels that are significantly below expected performance, enabling users to highlight possible optimization opportunities. We can use this framework to capture such information on an application level without the need for executing with a profiler, which requires additional tooling for analysis.

The BLAS 1/2 and solver kernels should be pretty accurate, only the SpMVs undercount accesses to the input vector, which should ideally be served from cache though. IIRC, the footprints are equivalent to what we used in our ACM TOMS paper to report achieved bandwidths for different SpMVs and solvers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review is:idea Just a thought - if it's good, it could evolve into a proposal. mod:core This is related to the core module. reg:benchmarking This is related to benchmarking. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants