-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing gracefully #103
Comments
Hi, thanks for the issue. The two times problems can be noticed are:
Macro-expansion happens immediately upon defining a function. It'll create a I think I'd prefer errors immediately at macro-expansion time, if possible. When compiling the generated function, I would like to at least have type-based fall-back behavior. |
I've been enjoying some awesome 2x performance improvements from sticking an
This sounds like it would be exactly what I need! |
The major obstacle to this is that the original loop expression is lost, and the Perhaps this is another argument for As a stop gap, we could revive a modification of this code as a stop-gap. julia> Union{Bool, Base.HWReal}
Union{Bool, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8} If not, it'll return the original quote. PRs welcome if anyone wants to try their hand at implementing this. ;) BTW, you can see some examples of using it with ForwardDiff here: |
I'm not sure if I understand what the I was contemplating if a macro could be put at the entire function definition to from the following function f(a::T, b::Vector{T}) where T generate two methods, the one above, and additionally this one function f(a::T, b::Vector{T}) where T <: Base.HWReal where only the last one would contain the |
The file I linked no longer exists, and is no longer how things work. The macro now: julia> @macroexpand @avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
quote
var"##loopi#1406" = LoopVectorization.maybestaticrange(eachindex(a, b))
var"##i_loop_lower_bound#1407" = LoopVectorization.maybestaticfirst(var"##loopi#1406")
var"##i_loop_upper_bound#1408" = LoopVectorization.maybestaticlast(var"##loopi#1406")
var"##vptr##_a" = LoopVectorization.stridedpointer(a)
var"##vptr##_b" = LoopVectorization.stridedpointer(b)
local var"##s_0"
begin
$(Expr(:gc_preserve, :(var"##s_0" = LoopVectorization._avx_!(Val{(0, 0)}(), Tuple{:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, LoopVectorization.memload, 0x01, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, LoopVectorization.memload, 0x02, 0x02), :LoopVectorization, :LOOPCONSTANTINSTRUCTION, LoopVectorization.OperationStruct(0x0000000000000000, 0x0000000000000000, 0x0000000000000001, 0x0000000000000000, LoopVectorization.constant, 0x00, 0x03), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000001, 0x0000000000000000, 0x0000000000010203, LoopVectorization.compute, 0x00, 0x03)}, Tuple{LoopVectorization.ArrayRefStruct{:a,Symbol("##vptr##_a")}(0x0000000000000001, 0x0000000000000001, 0x0000000000000000), LoopVectorization.ArrayRefStruct{:b,Symbol("##vptr##_b")}(0x0000000000000001, 0x0000000000000001, 0x0000000000000000)}, Tuple{0, Tuple{4}, Tuple{3}, Tuple{}, Tuple{}, Tuple{}, Tuple{}}, Tuple{:i}, (var"##i_loop_lower_bound#1407":var"##i_loop_upper_bound#1408",), var"##vptr##_a", var"##vptr##_b", s)), :a, :b))
end
s = LoopVectorization.reduced_add(var"##s_0", s)
end It generates a call to _avx_!, along with a summary of what
Outer macros expand first, so this shouldn't be a problem. julia> a = rand(43); b = rand(43);
julia> s = 0.0;
julia> macro showex(x)
esc(@show x)
end
@showex (macro with 1 method)
julia> @showex @avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
x = :(#= REPL[23]:1 =# @avx for i = eachindex(a, b)
#= REPL[23]:2 =#
s += a[i] * b[i]
end)
11.942335582598114 LoopVectorization actually has to call macroexpand so that it can avoid parsing them, and so that things like I'd definitely accept a macro like that as a PR. Although I think it should be |
EDIT: hold on a second, I had made a mistake |
This seems to be working. Right now it only handles a single type parameter, but that can be improved if you find the approach okay using MacroTools, LoopVectorization
macro maybeavx(ex)
def = splitdef(ex)
haskey(def,:whereparams) || throw(ArgumentError("Found no type parameter in the function definition"))
length(def[:whereparams]) == 1 || throw(ArgumentError("Only a single type parameter supported"))
T = def[:whereparams][1] # This is the type parameter
# Remove @avx annotations
ex_noavx = MacroTools.prewalk(ex) do x
if x isa Expr && x.head === :macrocall
return x.args[3] # This returns the expression inside the macrocall
end
x
end
quote
# Define method without @avx annotations
$(esc(ex_noavx))
# Define the method that retains the @avx annotations
$(esc(def[:name]))($(esc.(def[:args])...)) where $(esc(T)) <: Union{Bool, Base.HWReal} = $(esc(def[:body]))
end
end
@maybeavx function f(a::T, b::Vector{T}) where T
@avx a .+ b
end
using DoubleFloats
julia> f(1.0, ones(3))
3-element Array{Float64,1}:
2.0
2.0
2.0
julia> f(Double64(1.0), Double64.(ones(3)))
3-element Array{Double64,1}:
2.0
2.0
2.0
julia> @code_lowered f(1.0, ones(3))
CodeInfo(
1 ─ 266 = Base.broadcasted(Main.:+, a, b)
│ %2 = 266
│ %3 = Core.apply_type(Main.Val, :Main)
│ %4 = (%3)()
│ %5 = LoopVectorization.vmaterialize(%2, %4)
│ 267 = %5
└── return %5
)
julia> @code_lowered f(Double64(1.0), Double64.(ones(3)))
CodeInfo(
1 ─ %1 = Base.broadcasted(Main.:+, a, b)
│ %2 = Base.materialize(%1)
└── return %2
) |
Such a macro sounds like a neat idea. I wonder if it shouldn't insert the two options as if/else in the body of the function, to avoid strange interactions with dispatch. Perhaps too complicated, but I also wonder whether it should check that the arrays aren't sparse, or CuArrays, too. Something like this, with @maybeavx function f(x, s)
a, b = x .+ 1, x .- 1
@avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
...
# becomes:
function f(x, s)
a, b = x .+ 1, x .- 1
if check_types(a, b)
@avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
else
for i ∈ eachindex(a,b)
... Edit: if you want |
@baggepinnen , I liked your approach until:
Ha ha, that's a lot simpler than what I had suggested. A modification of the |
Perhaps something like this: check_valid_args() = true
check_valid_args(::Any) = false
check_valid_args(::Type{T}) where {T <: Union{Base.HWReal, Bool}} = true
check_valid_args(::T) where {T <: Number} = check_valid_args(T)
check_valid_args(::StridedArray{T}) where {T} = check_valid_args(T)
check_valid_args(::AbstractRange{T}) where {T} = check_valid_args(T)
check_valid_args(a, b, args...) = check_valid_args(a) && check_valid_args(b) && check_valid_args(args....) This would let you to opt in an array type struct MyArray{T,N} <: DenseArray{T,N}
...
end or LoopVectorization.check_valid_args(::MyArray{T}) where {T} = LoopVectorization.check_valid_args(T) And to opt in your own element type, LoopVectorization.check_valid_args(::Type{MyElementType}) = true Some additional definitions: using LinearAlgebra
check_valid_args(A::Adjoint) = check_valid_args(parent(A))
check_valid_args(A::Transpose) = check_valid_args(parent(A))
using OffsetArrays
check_valid_args(A::OffsetArray) = check_valid_args(parent(A)) Anything else I'm missing? |
My suggestion for how to check applicability was going be something ... not so different to yours, but allowing for wrappers such as those of NamedDims, which don't do anything which this has to know about, but are nice to pass along. |
My problem with that generic definition is
For example: struct MyOffsetArray{T,N} <: AbstractArray{T,N}
data::Array{T,N}
offsets::NTuple{N,Int}
end
Base.@propagate_inbounds function Base.getindex(A::MyOffsetArray{T,N}, i::Varargs{Int,N}) where {T,N}
A.data[(i .+ A.offsets)...]
end
Base.parent(A::MyOffsetArray) = A.data
# Other methods, including `Base.strides` and `Base.pointer`/`Base.unsafe_convert(Ptr{T}, ...)` But maybe that's okay. Those array types that don't work correctly: julia> pointer(A')
ERROR: conversion to pointer not defined for Adjoint{Float64,Array{Float64,2}}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] unsafe_convert(::Type{Ptr{Float64}}, ::Adjoint{Float64,Array{Float64,2}}) at ./pointer.jl:67
[3] pointer(::Adjoint{Float64,Array{Float64,2}}) at ./abstractarray.jl:933
[4] top-level scope at REPL[27]:1
julia> pointer(OffsetArray(A,1,1))
ERROR: conversion to pointer not defined for OffsetArray{Float64,2,Array{Float64,2}}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] unsafe_convert(::Type{Ptr{Float64}}, ::OffsetArray{Float64,2,Array{Float64,2}}) at ./pointer.jl:67
[3] pointer(::OffsetArray{Float64,2,Array{Float64,2}}) at ./abstractarray.jl:933
[4] top-level scope at REPL[28]:1
julia> pointer(A)
Ptr{Float64} @0x000055bfbb939f40 Don't define some of the needed methods, meaning they'll get errors and at least learn about it. EDIT: Would be great if there were a |
I guess the clear criteria are:
As you say, after that it depends how optimistic you want to be about weird wrappers. Besides OffsetArrays (for which one could test My PR here https://github.com/FluxML/NNlib.jl/pull/191/files has a go at making an (Re traits, there is https://github.com/JuliaMatrices/ArrayLayouts.jl, and I quite like JuliaLang/julia#30432 but it hasn't been merged.) |
Okay, I'm fine with being more optimistic and using the recursive I'll work on this tonight. |
Got around to it about a day late, but I did add it yesterday, using mcabbot's implementation. Example from the unit tests, wrapper type that does not subtype using LoopVectorization
struct FallbackArrayWrapper{T,N} <: AbstractArray{T,N}
data::Array{T,N}
end
Base.size(A::FallbackArrayWrapper) = size(A.data)
Base.@propagate_inbounds Base.getindex(A::FallbackArrayWrapper, i::Vararg{Int, N}) where {N} = getindex(A.data, i...)
Base.@propagate_inbounds Base.setindex!(A::FallbackArrayWrapper, v, i::Vararg{Int, N}) where {N} = setindex!(A.data, v, i...)
Base.IndexStyle(::Type{<:FallbackArrayWrapper}) = IndexLinear()
function msd(x)
s = zero(eltype(x))
for i in eachindex(x)
s += x[i] * x[i]
end
s
end
function msdavx(x)
s = zero(eltype(x))
@avx for i in eachindex(x)
s += x[i] * x[i]
end
s
end
x = fill(sqrt(63.0), 128);
x[1] = 1e9; This yields julia> msd(x)
1.0e18
julia> msdavx(x)
1.0000000000000081e18
julia> msdavx(FallbackArrayWrapper(x))
1.0e18 The test is based on the fact that Although, this begs the question: |
I am not sure how this addresses
The macro-based approach would solve both those problems? |
julia> using FillArrays, LoopVectorization
julia> function msdavx(x)
s = zero(eltype(x))
@avx for i in eachindex(x)
s += x[i] * x[i]
end
s
end
msdavx (generic function with 1 method)
julia> msdavx(Ones(100))
100.0 Have an example of this not working? help?> LoopVectorization.check_args
check_args(::Vararg{AbstractArray})
LoopVectorization will optimize an @avx loop if check_args on each on the indexed abstract arrays returns true. It returns true for AbstractArray{T}s when check_type(T) == true and the array or its parent is a StridedArray or AbstractRange.
To provide support for a custom array type, ensure that check_args returns true, either through overloading it or subtyping DenseArray. Additionally, define pointer and stride methods.
If you have an error with a custom array type that is/has a parent that is a |
`AbstractFill` does not subtype `StridedArray`
Note that `StridedArray` is not an abstract type that you can subtype from but
a union, see also JuliaLang/julia#2345.
|
I think this is the right approach, meant to dig a bit for edge cases but haven't yet. Re FillArray, if I understand right this is being caught and sent to the non-vectorised path (since it's its own parent, and not a StridedArray). It could be allowed on the vectorised path, since it's just a constant, but might be messy. I got tired of avoiding this by inserting a trivial broadcast like Alternatively, I sometimes wonder if FillArray should define |
Ah I see, I misunderstood your comment to mean that one would wrap inputs in the fallback wrapper, but I understand now that it was only used to demonstrate that it worked :) Sorry for the noise. I'll update LV and see if it solved all the issues I had! |
Yes, There's another issue about implementing the strided interface as a trait instead, which would make it more flexible.
LLVM can optimize simple loops with constants very well. If I add julia> function msdavx(x)
s = zero(eltype(x))
@avx for i in eachindex(x)
@inbounds s += x[i]
end
s
end
msdavx (generic function with 1 method)
julia> @code_llvm debuginfo=:none msdavx(Ones{Int}(100))
define i64 @julia_msdavx_2597([1 x [1 x [1 x i64]]] addrspace(11)* nocapture nonnull readonly dereferenceable(8)) {
top:
%1 = getelementptr inbounds [1 x [1 x [1 x i64]]], [1 x [1 x [1 x i64]]] addrspace(11)* %0, i64 0, i64 0, i64 0, i64 0
%2 = load i64, i64 addrspace(11)* %1, align 8
%3 = icmp sgt i64 %2, 0
%spec.select = select i1 %3, i64 %2, i64 0
ret i64 %spec.select
} It just checks if the length is greater than
Yeah, that's much harder to do generically, but otherwise will work well.
Strides equal to 0 are useful for broadcasting a dimension when loading from a pointer, but in the case of julia> foo(x) = @inbounds x[1000]
foo (generic function with 1 method)
julia> foo(Ones(2))
1.0
No problem. If you do run into trouble, consider overloading |
OK I tried to make up some test cases. using LoopVectorization, LinearAlgebra, SparseArrays, FillArrays
# LoopVectorization v0.7.4, which does not use check_args
x1 = view(rand(10,10), 1:2:10, 1:3)
x1 isa StridedArray
check_args(x1) # true
@avx x1 .+ 1 # ok
x2 = x1' # not a StridedArray
x2 = transpose(x1) # ditto
check_args(x2) # true
@avx x2 .+ 1 # ok
x3 = reshape(view(rand(10,10), 1:2:10, 1:3), 3, 5) # not a StridedArray
check_args(x3) # true -> false
@avx x3 .+ 1 # ERROR: MethodError: no method matching strides
x4 = view(x2, 1:3, 1:3)
check_args(x4) # true --> false
@avx x4 .+ 1 # ERROR: conversion to pointer not defined
x5 = PermutedDimsArray(rand(10,10), (2,1))
check_args(x5) # true
@avx x5 .+ 1 # ok
x6 = view(x5, 1:3, 1:3)
check_args(x6) # true --> false, too pessimistic about views of wrappers
@avx x6 .+ 1 # ok
x7 = Diagonal(rand(10))
check_args(x7) # true --> false
@avx x7 .+ 1 # MethodError: no method matching ndims
x8 = sparse(rand(3,3))
check_args(x8) # false
@avx x8 .+ 1 # no method matching similar(::Base.Broadcast.Broadcasted{SparseArrays
x9 = Fill(1.2, 3,4)
check_args(x9) # false
@avx x9 .+ 1 # MethodError: no method matching vmaterialize
@inline check_args(A::StridedArray) = check_type(eltype(A))
@inline check_args(A::AbstractRange) = check_type(eltype(A))
@inline function check_args(A::AbstractArray)
M = parentmodule(typeof(A))
if parent(A) === A # SparseMatrix, StaticArray, etc
false
elseif A isa Union{Transpose, Adjoint}
check_args(parent(A))
elseif M === Base || M === Core || M ===LinearAlgebra
# reshapes which aren't StridedArrays, plus UpperTriangular, etc.
false
else
check_args(parent(A)) # PermutedDimsArray, NamedDimsArray
end
end |
Thanks for the suggestions. I'm using that now, with the addition to get @inline check_args(A::SubArray) = check_args(parent(A))
@inline check_args(A::OffsetArray) = check_args(parent(A))
@inline check_args(A::Adjoint) = check_args(parent(A))
@inline check_args(A::Transpose) = check_args(parent(A))
@inline check_args(A::PermutedDimsArray) = check_args(parent(A)) The Thanks again for the suggestion. You're also welcome to make PRs of course. |
But not every view of something valid still valid. x10 = view(x2, [1,3,2], 1:3) # x2::Transpose, which does not have strides(), but is OK
@avx x10 .+ 1 # ERROR: conversion to pointer not defined for SubArray{Float64,2... Perhaps the next level is to test something like this: x10.indices isa Tuple{Vararg{Union{Int,Colon,AbstractRange}}} # false, but true for x6 ... and eventually we will end up re-writing This setup also doesn't reject CuArrays (which I had on the list above), but that's OK since the fallback loops won't work there either. |
Broadcasts also aren't using Which check do
I think you were being sarcastic, but...if that's what it takes to improve resolution? |
I was mostly, and I don't swear that I got that right! It would be great to have some shared trait-like way to query this. I think CuArrays are <: StridedArray, but am at the wrong computer to check. |
I thought the point of being strides is to interface with C and Fortran libraries. I'm also in favor of the trait PR. While they'd fail in loops, once broadcasting uses |
Yes, I'm not sure what the logic was. On the "re-writing StridedArray in every package" front, well-behaved views of CuArrays are again of type CuArray, not SubArray, so that they can work with functions which dispatch on this. Again we should have some trait like Re broadcasting into aliased views, I think this goes wrong for CuArrays, and for |
The CPU may have gotten the correct answer in that CuArrays issue, but it is not immune from such weirdness, this is the issue I was thinking of: julia> x = [3]; v = view(x, [1,1])
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> x[[1,1]] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
6
6
julia> x = [3]; v = view(x, [1,1])
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> @views x[[1,1]] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
9
9
julia> x = [3]; I = [1,1]; v = view(x, I)
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> x[I] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
9
9
julia> x = [3]; I = [1,1]; v = view(x, I)
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> @views x[I] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
12
12 Although this was with a single index. Also realized one of the package's benchmarks relied on it working for triangular matrices. |
Thanks for the aliasing links, these things get tricky fast. It took me a while but eventually I realise your UpperTriangular example is only reading the diagonal, which is fine. The generic case is not of course: julia> rr = rand(3,3); mm = similar(rr); uu = UpperTriangular(rr)
julia> @avx for i in eachindex(uu)
mm[i] = uu[i] + 10
end
julia> mm
3×3 Array{Float64,2}:
10.4958 10.9055 10.1759
10.5271 10.5611 10.1929
10.3082 10.0969 10.7074 however on the latest version, instead I get |
Yes, that's why I'm not special casing it (unlike the earlier SubArray example).
I fixed it on master and added a test using a diagonal matrix. I'll release it soon. julia> rr = rand(3,3); mm = similar(rr); uu = UpperTriangular(rr)
3×3 UpperTriangular{Float64,Array{Float64,2}}:
0.664876 0.390798 0.890371
⋅ 0.741363 0.852288
⋅ ⋅ 0.249216
julia> @avx for i in eachindex(uu)
mm[i] = uu[i] + 10
end
julia> mm
3×3 Array{Float64,2}:
10.6649 10.3908 10.8904
10.0 10.7414 10.8523
10.0 10.0 10.2492 |
My problems are now all but a distant memory, I can now use forward diff and other exotic number types without a problem. Thanks :) |
It might be worth thinking about falling back to traditional loops if there are unrecognized expressions while issuing a warning about not optimizing instead of a hard error.
Feel free to close if this doesn't align with your goals: errors are fine too if the expectance is that
@avx
should fail if unable to optimize.The text was updated successfully, but these errors were encountered: