Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add N for hardware indices #541

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

add N for hardware indices #541

wants to merge 3 commits into from

Conversation

vchuravy
Copy link
Member

No description provided.

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Benchmark Results

main 2afdea1... main/2afdea1881607f...
saxpy/default/Float16/1024 0.534 ± 0.0077 μs 0.588 ± 0.0056 μs 0.909
saxpy/default/Float16/1048576 0.174 ± 0.0075 ms 0.172 ± 0.0046 ms 1.01
saxpy/default/Float16/16384 3.12 ± 0.031 μs 3.16 ± 0.044 μs 0.989
saxpy/default/Float16/2048 0.71 ± 0.0094 μs 0.742 ± 0.0094 μs 0.957
saxpy/default/Float16/256 0.404 ± 0.0079 μs 0.445 ± 0.0083 μs 0.909
saxpy/default/Float16/262144 0.0441 ± 0.00043 ms 0.0436 ± 0.00038 ms 1.01
saxpy/default/Float16/32768 5.79 ± 0.05 μs 5.83 ± 0.073 μs 0.994
saxpy/default/Float16/4096 1.09 ± 0.018 μs 1.13 ± 0.018 μs 0.966
saxpy/default/Float16/512 0.452 ± 0.0077 μs 0.485 ± 0.005 μs 0.933
saxpy/default/Float16/64 0.378 ± 0.0068 μs 0.412 ± 0.0046 μs 0.915
saxpy/default/Float16/65536 11.4 ± 0.12 μs 11.4 ± 0.091 μs 1
saxpy/default/Float32/1024 0.429 ± 0.0068 μs 0.435 ± 0.0091 μs 0.986
saxpy/default/Float32/1048576 0.198 ± 0.034 ms 0.202 ± 0.031 ms 0.981
saxpy/default/Float32/16384 2.53 ± 0.14 μs 2.56 ± 0.19 μs 0.986
saxpy/default/Float32/2048 0.526 ± 0.017 μs 0.539 ± 0.054 μs 0.975
saxpy/default/Float32/256 0.371 ± 0.0063 μs 0.38 ± 0.0066 μs 0.976
saxpy/default/Float32/262144 0.0451 ± 0.0045 ms 0.0445 ± 0.0051 ms 1.01
saxpy/default/Float32/32768 4.99 ± 0.28 μs 5.05 ± 0.47 μs 0.989
saxpy/default/Float32/4096 0.907 ± 0.069 μs 0.916 ± 0.073 μs 0.991
saxpy/default/Float32/512 0.393 ± 0.0069 μs 0.393 ± 0.0069 μs 0.999
saxpy/default/Float32/64 0.366 ± 0.0055 μs 0.36 ± 0.0063 μs 1.02
saxpy/default/Float32/65536 11.8 ± 1.1 μs 11.7 ± 1.5 μs 1.01
saxpy/default/Float64/1024 0.531 ± 0.02 μs 0.521 ± 0.049 μs 1.02
saxpy/default/Float64/1048576 0.482 ± 0.054 ms 0.486 ± 0.051 ms 0.991
saxpy/default/Float64/16384 4.98 ± 0.43 μs 5.21 ± 1.5 μs 0.956
saxpy/default/Float64/2048 0.918 ± 0.095 μs 0.901 ± 0.084 μs 1.02
saxpy/default/Float64/256 0.398 ± 0.0075 μs 0.381 ± 0.0067 μs 1.05
saxpy/default/Float64/262144 0.0915 ± 0.01 ms 0.0884 ± 0.013 ms 1.03
saxpy/default/Float64/32768 11.9 ± 1.1 μs 12.1 ± 1.5 μs 0.983
saxpy/default/Float64/4096 1.47 ± 0.16 μs 1.48 ± 0.22 μs 0.99
saxpy/default/Float64/512 0.423 ± 0.011 μs 0.423 ± 0.0096 μs 0.999
saxpy/default/Float64/64 0.373 ± 0.0065 μs 0.357 ± 0.0049 μs 1.04
saxpy/default/Float64/65536 23.8 ± 3.7 μs 23.8 ± 3.6 μs 0.998
saxpy/static workgroup=(1024,)/Float16/1024 1.92 ± 0.024 μs 2.12 ± 0.035 μs 0.905
saxpy/static workgroup=(1024,)/Float16/1048576 0.159 ± 0.0093 ms 0.16 ± 0.0093 ms 0.992
saxpy/static workgroup=(1024,)/Float16/16384 4.18 ± 0.13 μs 4.41 ± 0.13 μs 0.947
saxpy/static workgroup=(1024,)/Float16/2048 2.11 ± 0.041 μs 2.31 ± 0.054 μs 0.912
saxpy/static workgroup=(1024,)/Float16/256 2.58 ± 0.026 μs 2.77 ± 0.032 μs 0.929
saxpy/static workgroup=(1024,)/Float16/262144 0.042 ± 0.0016 ms 0.0418 ± 0.0015 ms 1.01
saxpy/static workgroup=(1024,)/Float16/32768 6.62 ± 0.26 μs 6.83 ± 0.26 μs 0.969
saxpy/static workgroup=(1024,)/Float16/4096 2.4 ± 0.034 μs 2.64 ± 0.04 μs 0.909
saxpy/static workgroup=(1024,)/Float16/512 3.02 ± 0.026 μs 3.21 ± 0.057 μs 0.94
saxpy/static workgroup=(1024,)/Float16/64 2.29 ± 0.047 μs 2.46 ± 0.062 μs 0.932
saxpy/static workgroup=(1024,)/Float16/65536 12.4 ± 0.59 μs 12.5 ± 0.57 μs 0.995
saxpy/static workgroup=(1024,)/Float32/1024 1.96 ± 0.034 μs 2.11 ± 0.043 μs 0.931
saxpy/static workgroup=(1024,)/Float32/1048576 0.2 ± 0.034 ms 0.21 ± 0.037 ms 0.953
saxpy/static workgroup=(1024,)/Float32/16384 4.11 ± 0.31 μs 4.22 ± 0.32 μs 0.975
saxpy/static workgroup=(1024,)/Float32/2048 2.1 ± 0.037 μs 2.27 ± 0.07 μs 0.926
saxpy/static workgroup=(1024,)/Float32/256 2.46 ± 0.052 μs 2.56 ± 0.04 μs 0.962
saxpy/static workgroup=(1024,)/Float32/262144 0.0486 ± 0.0046 ms 0.0473 ± 0.0059 ms 1.03
saxpy/static workgroup=(1024,)/Float32/32768 7.23 ± 0.43 μs 7.18 ± 0.55 μs 1.01
saxpy/static workgroup=(1024,)/Float32/4096 2.4 ± 0.071 μs 2.53 ± 0.067 μs 0.946
saxpy/static workgroup=(1024,)/Float32/512 2.44 ± 0.043 μs 2.6 ± 0.065 μs 0.941
saxpy/static workgroup=(1024,)/Float32/64 2.86 ± 8.9 μs 2.62 ± 5.3 μs 1.09
saxpy/static workgroup=(1024,)/Float32/65536 14.2 ± 1.5 μs 14.2 ± 1.5 μs 0.998
saxpy/static workgroup=(1024,)/Float64/1024 2.05 ± 0.057 μs 2.22 ± 0.061 μs 0.92
saxpy/static workgroup=(1024,)/Float64/1048576 0.507 ± 0.047 ms 0.505 ± 0.046 ms 1
saxpy/static workgroup=(1024,)/Float64/16384 6.88 ± 0.33 μs 7.12 ± 0.54 μs 0.966
saxpy/static workgroup=(1024,)/Float64/2048 2.32 ± 0.06 μs 2.48 ± 0.063 μs 0.935
saxpy/static workgroup=(1024,)/Float64/256 2.43 ± 0.13 μs 2.65 ± 0.14 μs 0.917
saxpy/static workgroup=(1024,)/Float64/262144 0.0987 ± 0.015 ms 0.0971 ± 0.014 ms 1.02
saxpy/static workgroup=(1024,)/Float64/32768 14.3 ± 1.5 μs 14.4 ± 1.6 μs 0.998
saxpy/static workgroup=(1024,)/Float64/4096 2.87 ± 0.14 μs 3.07 ± 0.23 μs 0.936
saxpy/static workgroup=(1024,)/Float64/512 2.39 ± 0.05 μs 2.59 ± 0.11 μs 0.922
saxpy/static workgroup=(1024,)/Float64/64 2.39 ± 11 μs 20.5 ± 27 μs 0.117
saxpy/static workgroup=(1024,)/Float64/65536 26.3 ± 3.5 μs 26.3 ± 3.8 μs 0.999
time_to_load 0.324 ± 0.0016 s 0.321 ± 0.0038 s 1.01

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy changed the base branch from vc/better_expand to main January 7, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant