-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change type parameter range for AbstractStridedPointer #49
Conversation
There is an ambiguity error and an error in the memory tests. The memory test error should be fixed on master. Mind merging in master and trying to fix the ambiguity? vstore!(ptr::Union{Ptr, VectorizationBase.AbstractStridedPointer}, v::Number) in VectorizationBase at /home/runner/work/VectorizationBase.jl/VectorizationBase.jl/src/llvm_intrin/memory_addr.jl:850
vstore!(ptr::VectorizationBase.AbstractStridedPointer{T, N, C, B, R, X, O} where {N, C, B, R, X<:Tuple{Vararg{Any, N}}, O<:Tuple{Vararg{Any, N}}}, v::T) where T in VectorizationBase at /home/runner/work/VectorizationBase.jl/VectorizationBase.jl/src/strided_pointers/stridedpointers.jl:121 Previously, the second method was always more specific, as all So, perhaps try changing the first definition to just |
Thanks for noticing, I just fixed it. "Less Interface"ismI notice that there are 57 methods called
e.g.
f(b) = x -> f(x, b) I choose not to.
f(x, b) = ...
f(x) = f(x, 1.0) I choose not to. "Less interface" rule does not apply when the interface decrease the user input dedundancy. e.g. f(x) = f(x, x^2) is good. because people has less probability to make an error from repeated input. Just sharing my aesthetics of coding, by no means to change the current design since they are already thouroughly tested. |
Codecov Report
@@ Coverage Diff @@
## master #49 +/- ##
==========================================
- Coverage 60.88% 59.43% -1.46%
==========================================
Files 31 31
Lines 3981 3976 -5
==========================================
- Hits 2424 2363 -61
- Misses 1557 1613 +56
Continue to review full report at Codecov.
|
The interface is much smaller than the number of methods. f(x::Float64, b) = ...
f(x::Int, b) = ...
f(x::String, b) = ...
f(x::Tuple, b) = ... That is, many of the methods do not represent a different API. There is also a processing pipeline from surface API before final code generation. So a change I probably should make is renaming all those For vload(p::AbstractStridedPointer, i::Tuple)
vload(p::AbstractStridedPointer, i::Tuple, m::Mask)
vload(p::AbstractStridedPointer, i::Unroll)
vload(p::AbstractStridedPointer, i::Unroll, m::Mask)
julia> vload
vload (generic function with 41 methods) Example usage, and what all that internal processing is for: julia> colors = [(R = rand(), G = rand(), B = rand()) for i ∈ 1:100];
julia> colormat = reinterpret(reshape, Float64, colors)
3×100 reinterpret(reshape, Float64, ::Vector{NamedTuple{(:R, :G, :B), Tuple{Float64, Float64, Float64}}}) with eltype Float64:
0.101462 0.566472 0.395725 0.156595 0.247472 0.58609 0.451675 0.329288 0.628678 0.0161756 0.457912 0.929442 0.209048 0.225449 0.570513 … 0.373047 0.0599544 0.487379 0.48549 0.219519 0.758767 0.349387 0.575183 0.0668234 0.204329 0.371181 0.197348 0.652758 0.630789
0.919857 0.852129 0.0622284 0.479014 0.596842 0.600178 0.188863 0.411865 0.909364 0.000210617 0.254854 0.469283 0.552034 0.202351 0.948111 0.537816 0.65176 0.619699 0.520092 0.515346 0.791153 0.326027 0.104903 0.888952 0.459569 0.0474623 0.161321 0.59407 0.0856169
0.980903 0.841869 0.525539 0.933114 0.766957 0.116385 0.167431 0.964176 0.636977 0.491324 0.584939 0.135019 0.58517 0.467258 0.654256 0.0479113 0.162333 0.363111 0.162873 0.00662285 0.204877 0.789972 0.536066 0.0866196 0.753488 0.279345 0.556696 0.625233 0.364667
julia> colors[1]
(R = 0.10146162439576822, G = 0.9198573051020194, B = 0.980902779143441)
julia> vload(stridedpointer(colormat), (2,3)) # load element [2,3]
0.0622284186540214
julia> vload(stridedpointer(colormat), (2,MM{8}(3))) # load elements colormat[2,3:10]
Vec{8, Float64}<0.0622284186540214, 0.47901368495584085, 0.596842316190509, 0.6001775663920466, 0.18886316876982878, 0.41186519614364814, 0.9093641942999946, 0.00021061724559756634>
julia> colormat[2,3:10]'
1×8 adjoint(::Vector{Float64}) with eltype Float64:
0.0622284 0.479014 0.596842 0.600178 0.188863 0.411865 0.909364 0.000210617
julia> vload(stridedpointer(colormat), (MM{4}(1),3), mask(Val(4), 3)) # load elements colormat[1:3,3]
Vec{4, Float64}<0.39572490568820506, 0.0622284186540214, 0.5255392424137211, 0.0>
julia> colormat[1:3,3]'
1×3 adjoint(::Vector{Float64}) with eltype Float64:
0.395725 0.0622284 0.525539
julia> vload(stridedpointer(colormat), (MM{4}(1),MM{4}(3)), mask(Val(4), 3)) # load elements colormat[(1,3), (2,4), (3,5)]
Vec{4, Float64}<0.39572490568820506, 0.47901368495584085, 0.7669572786918826, 0.0>
julia> getindex.(Ref(colormat), 1:3, 3:5)'
1×3 adjoint(::Vector{Float64}) with eltype Float64:
0.395725 0.479014 0.766957
The julia> vload(stridedpointer(colormat), Unroll{1,1,3,2,8,zero(UInt),1}((1,1))) #axis 1 unrolled w/ step of 1, unrolled 3x, axis 2 vectorized, loading 8 elements,mask 0 of the loads, loaded vector elements are 1 apart
3 x Vec{8, Float64}
Vec{8, Float64}<0.10146162439576822, 0.5664720363768123, 0.39572490568820506, 0.1565950460890433, 0.24747228039940494, 0.586089973376654, 0.4516746290804887, 0.3292882151577483>
Vec{8, Float64}<0.9198573051020194, 0.852129002305116, 0.0622284186540214, 0.47901368495584085, 0.596842316190509, 0.6001775663920466, 0.18886316876982878, 0.41186519614364814>
Vec{8, Float64}<0.980902779143441, 0.8418685033011257, 0.5255392424137211, 0.9331143002012057, 0.7669572786918826, 0.11638477303011419, 0.16743089573675807, 0.964175916870037> Why do this instead of just using the first API 3 times? julia> vload(stridedpointer(colormat), (1,MM{8}(1))) # load [1,1:8]
Vec{8, Float64}<0.10146162439576822, 0.5664720363768123, 0.39572490568820506, 0.1565950460890433, 0.24747228039940494, 0.586089973376654, 0.4516746290804887, 0.3292882151577483>
julia> vload(stridedpointer(colormat), (2,MM{8}(1))) # load [2,1:8]
Vec{8, Float64}<0.9198573051020194, 0.852129002305116, 0.0622284186540214, 0.47901368495584085, 0.596842316190509, 0.6001775663920466, 0.18886316876982878, 0.41186519614364814>
julia> vload(stridedpointer(colormat), (3,MM{8}(1))) # load [3,1:8]
Vec{8, Float64}<0.980902779143441, 0.8418685033011257, 0.5255392424137211, 0.9331143002012057, 0.7669572786918826, 0.11638477303011419, 0.16743089573675807, 0.964175916870037> Because the julia> @benchmark (vload(stridedpointer($colormat), (1,MM{8}(1))), vload(stridedpointer($colormat), (2,MM{8}(1))), vload(stridedpointer($colormat), (3,MM{8}(1))))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.940 ns (0.00% GC)
median time: 6.040 ns (0.00% GC)
mean time: 6.076 ns (0.00% GC)
maximum time: 14.404 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark vload(stridedpointer($colormat), Unroll{1,1,3,2,8,zero(UInt),1}((1,1)))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 3.080 ns (0.00% GC)
median time: 3.097 ns (0.00% GC)
mean time: 3.108 ns (0.00% GC)
maximum time: 8.722 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000 It's often possible to optimize groups of loads and stores that are done together, and that is what the The case for I'm definitely open to suggestions for cleaning things up, and do thing the But the actual API of these functions is pretty straightforward, with most of the methods coming from determining the behavior from the types. I should probably document this somewhere more clearly! |
Thanks for your very detailed explaination. The reason why I am confused because the list is very long when I type Writing many docstring in a too early stage is generally not good for future change of design, but a minimum documentation of most important functions would help people to understand the design. Let me see if I can help adding some documentations. Cheers. |
This is for supporting different number types.
JuliaSIMD/LoopVectorization.jl#201