-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added some methods to help support LoopVectorization.jl #61
Changes from 1 commit
9a41d08
597606b
2513045
791cffa
6c36e69
8af5986
87e8bd1
53929ad
572fccc
874fee8
358a93c
0ca96b8
9b09212
3417ba0
944fd93
a8567db
bd55e33
0862eb6
a3ac561
2f92c61
9778d47
ab02af4
7cfd029
39f414b
9f852da
94eb2a7
973a01c
6e9207b
00d0a00
5120651
d6130f8
a88c692
910a42a
6ad4ec5
28404dc
1f08ae0
00c9112
5f1f14f
dd7b0ed
726691e
895f9b7
a59e05a
7d17247
18b954b
09515fe
0ef7653
33b8433
efe40e1
e72cfc8
42258b5
090dab9
ac8033a
21f7d48
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- v7.18.0
- v7.17.1
- v7.17.0
- v7.16.0
- v7.15.0
- v7.14.0
- v7.13.0
- v7.12.0
- v7.11.0
- v7.10.0
- v7.9.0
- v7.8.1
- v7.8.0
- v7.7.1
- v7.7.0
- v7.6.1
- v7.6.0
- v7.5.1
- v7.5.0
- v7.4.11
- v7.4.10
- v7.4.9
- v7.4.8
- v7.4.7
- v7.4.6
- v7.4.5
- v7.4.4
- v7.4.3
- v7.4.2
- v7.4.1
- v7.4.0
- v7.3.1
- v7.3.0
- v7.2.1
- v7.2.0
- v7.1.0
- v7.0.0
- v6.0.25
- v6.0.24
- v6.0.23
- v6.0.22
- v6.0.21
- v6.0.20
- v6.0.19
- v6.0.18
- v6.0.17
- v6.0.16
- v6.0.15
- v6.0.14
- v6.0.13
- v6.0.12
- v6.0.11
- v6.0.10
- v6.0.9
- v6.0.8
- v6.0.7
- v6.0.6
- v6.0.5
- v6.0.4
- v6.0.3
- v6.0.2
- v6.0.1
- v6.0.0
- v5.0.8
- v5.0.7
- v5.0.6
- v5.0.5
- v5.0.4
- v5.0.3
- v5.0.2
- v5.0.1
- v5.0.0
- v4.0.4
- v4.0.3
- v4.0.2
- v4.0.1
- v4.0.0
- v3.2.2
- v3.2.1
- v3.2.0
- v3.1.40
- v3.1.39
- v3.1.38
- v3.1.37
- v3.1.36
- v3.1.35
- v3.1.34
- v3.1.33
- v3.1.32
- v3.1.31
- v3.1.30
- v3.1.29
- v3.1.28
- v3.1.27
- v3.1.26
- v3.1.25
- v3.1.24
- v3.1.23
- v3.1.22
- v3.1.21
- v3.1.20
- v3.1.19
- v3.1.18
- v3.1.17
- v3.1.16
- v3.1.15
- v3.1.14
- v3.1.13
- v3.1.12
- v3.1.11
- v3.1.10
- v3.1.9
- v3.1.8
- v3.1.7
- v3.1.6
- v3.1.5
- v3.1.4
- v3.1.3
- v3.1.2
- v3.1.1
- v3.1.0
- v3.0.2
- v3.0.1
- v3.0.0
- v2.14.17
- v2.14.16
- v2.14.15
- v2.14.14
- v2.14.13
- v2.14.12
- v2.14.11
- v2.14.10
- v2.14.9
- v2.14.8
- v2.14.7
- v2.14.6
- v2.14.5
- v2.14.4
- v2.14.3
- v2.14.2
- v2.14.1
- v2.14.0
- v2.13.7
- v2.13.6
- v2.13.5
- v2.13.4
- v2.13.3
- v2.13.2
- v2.13.1
- v2.13.0
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -112,6 +112,21 @@ If `step` of instances of type `T` are known at compile time, return that step. | |
Otherwise, returns `nothing`. For example, `known_step(UnitRange{Int})` returns | ||
`one(Int)`. | ||
|
||
## is_cpu_column_major(::Type{T}) | ||
|
||
Returns `true` if instances of type `T` have a strided column-major memory layout and support | ||
`pointer`. Returns `false` otherwise. | ||
|
||
## stridelayout(::Type{T}) | ||
|
||
Returns a 3-tuple describing the strided layout of the memory of an instance of type `T` if it | ||
is known, and returning `nothing` otherwise. | ||
The elements of the tuple include | ||
- `contig`: The axis with contiguous elements. `contig == -1` indicates no axis is contiguous. `striderank[contig]` does not necessarilly equal `1`. | ||
- `batch`: indicates the number of contiguous elements. | ||
- `striderank` indicates the rank of the given stride with respect to the others. If for `A::T` we have `striderank[i] > striderank[j]`, then `stride(A,i) > stride(A,j)`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add can_avx? |
||
|
||
|
||
# List of things to add | ||
|
||
- https://github.com/JuliaLang/julia/issues/22216 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -530,6 +530,139 @@ known_step(x) = known_step(typeof(x)) | |
known_step(::Type{T}) where {T} = nothing | ||
known_step(::Type{<:AbstractUnitRange{T}}) where {T} = one(T) | ||
|
||
""" | ||
is_cpu_column_major(::Type{T}) | ||
Does an Array of type `T` point to column major memory in the cpu's address space? | ||
If `is_cpu_column_major(typeof(A))` return `true` and the element type is a primite | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is |
||
type, then the array should be compatible with `LoopVectorization.jl` as well as | ||
`C` and `Fortran` programs requiring pointers and assuming column major memory layout. | ||
If `is_cpu_column_major(typeof(A))` return `true`, the array supports the | ||
[Strided Array](https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-strided-arrays-1) interface. | ||
""" | ||
is_cpu_column_major(x) = is_cpu_column_major(typeof(x)) | ||
is_cpu_column_major(::Type) = false | ||
is_cpu_column_major(::Type{<:Array}) = true | ||
is_cpu_column_major(::Type{S}) where {A, S <: SubArray{<:Any,<:Any,A,<:Tuple{Vararg{Union{Int,<:AbstractRange}}}}} = is_cpu_column_major(A) | ||
|
||
""" | ||
stridelayout(::Type{T}) -> (contig, batch, striderank) | ||
Descrive the memory layout of a strided container of type `T`. If unknown or not strided, returns `nothing`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
Else, it returns a tuple with elements: | ||
- `contig`: The axis with contiguous elements. `contig == -1` indicates no axis is contiguous. `striderank[contig]` does not necessarilly equal `1`. | ||
- `batch`: indicates the number of contiguous elements. | ||
- `striderank` indicates the rank of the given stride with respect to the others. If for `A::T` we have `striderank[i] > striderank[j]`, then `stride(A,i) > stride(A,j)`. | ||
The convenience method | ||
```julia | ||
stridelayout(x) = stridelayout(typeof(x)) | ||
``` | ||
is also provided. | ||
```julia | ||
julia> A = rand(3,4,5); | ||
julia> stridelayout(A) | ||
(1, 1, Base.OneTo(3)) | ||
julia> stridelayout(PermutedDimsArray(A,(3,1,2))) | ||
(2, 1, (3, 1, 2)) | ||
julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])) | ||
(1, 1, (1, 2)) | ||
julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,1:2,:])) | ||
(2, 1, (3, 1, 2)) | ||
julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])) | ||
(-1, 1, (2, 1)) | ||
``` | ||
""" | ||
stridelayout(x) = stridelayout(typeof(x)) | ||
stridelayout(::Type) = nothing | ||
stridelayout(::Type{Array{T,N}}) where {T,N} = (1,1,Base.OneTo(N)) | ||
stridelayout(::Type{<:Tuple}) = (1,1,Base.OneTo(1)) | ||
function stridelayout(::Type{<:Union{Transpose{T,A},Adjoint{T,A}}}) where {T,A<:AbstractMatrix{T}} | ||
ml = stridelayout(A) | ||
isnothing(ml) && return nothing | ||
contig, batch, rank = ml | ||
new_rank = (rank[2], rank[1]) | ||
new_contig = congig == -1 ? -1 : 3 - contig | ||
new_contig, batch, new_rank | ||
end | ||
function stridelayout(::Type{<:PermutedDimsArray{T,N,I1,I2,A}}) where {T,N,I1,I2,A<:AbstractArray{T,N}} | ||
ml = stridelayout(A) | ||
isnothing(ml) && return nothing | ||
contig, batch, rank = ml | ||
new_contig = I2[contig] | ||
new_rank = ntuple(n -> rank[I1[n]], Val(N)) | ||
new_contig, batch, new_rank | ||
end | ||
@generated function stridelayout(::Type{S}) where {N,NP,T,A<:AbstractArray{T,NP},I,S <: SubArray{T,N,A,I}} | ||
ml = stridelayout(A) | ||
isnothing(ml) && return nothing | ||
contig, batch, rank = ml | ||
rankv = collect(rank) | ||
rank_new = Int[] | ||
n = 0 | ||
new_contig = contig | ||
for np in 1:NP | ||
r = rankv[np] | ||
if I.parameters[np] <: AbstractUnitRange | ||
n += 1 | ||
push!(rank_new, r) | ||
if np == contig | ||
new_contig = n | ||
end | ||
else | ||
# There's definitely a smarter way to do this. | ||
# When we drop a rank, we lower the others. | ||
for nᵢ ∈ 1:n | ||
rᵢ = rank_new[nᵢ] | ||
if rᵢ > r | ||
rank_new[nᵢ] = rᵢ - 1 | ||
end | ||
end | ||
for npᵢ ∈ np+1:NP | ||
rᵢ = rankv[npᵢ] | ||
if rᵢ > r | ||
rankv[npᵢ] = rᵢ - 1 | ||
end | ||
end | ||
if np == contig | ||
new_contig = -1 | ||
end | ||
end | ||
end | ||
# If n != N, then an axis was indeced by something other than an integer or `AbstractUnitRange`, so we return `nothing` | ||
n == N || return nothing | ||
ranktup = Expr(:tuple); append!(ranktup.args, rank_new) # dynamic splats bad | ||
Expr(:tuple, new_contig, batch, ranktup) | ||
end | ||
|
||
""" | ||
canavx(f) | ||
Returns `true` if the function `f` is guaranteed to be compatible with `LoopVectorization.@avx` for supported element and array types. | ||
While a return value of `false` does not indicate the function isn't supported, this allows a library to conservatively apply `@avx` | ||
only when it is known to be safe to do so. | ||
```julia | ||
function mymap!(f, y, args...) | ||
if canavx(f) | ||
@avx @. y = f(args...) | ||
else | ||
@. y = f(args...) | ||
end | ||
end | ||
``` | ||
""" | ||
canavx(::Any) = false | ||
|
||
|
||
function __init__() | ||
|
||
@require SuiteSparse="4607b0f0-06f3-5cda-b6b1-a6196a1729e9" begin | ||
|
@@ -544,6 +677,7 @@ function __init__() | |
ismutable(::Type{<:StaticArrays.StaticArray}) = false | ||
can_setindex(::Type{<:StaticArrays.StaticArray}) = false | ||
ismutable(::Type{<:StaticArrays.MArray}) = true | ||
ismutable(::Type{<:StaticArrays.SizedArray}) = true | ||
|
||
function lu_instance(_A::StaticArrays.StaticMatrix{N,N}) where {N} | ||
A = StaticArrays.SArray(_A) | ||
|
@@ -564,6 +698,10 @@ function __init__() | |
known_first(::Type{<:StaticArrays.SOneTo}) = 1 | ||
known_last(::Type{StaticArrays.SOneTo{N}}) where {N} = N | ||
|
||
is_cpu_column_major(::Type{<:StaticArrays.MArray}) = true | ||
# is_cpu_column_major(::Type{<:StaticArrays.SizedArray}) = false # Why? | ||
stridelayout(::Type{<:StaticArrays.StaticArray{S,T,N}}) where {S,T,N} = (1,1,Base.OneTo(N)) | ||
|
||
@require Adapt="79e6a3ab-5dfb-504d-930d-738a2a938a0e" begin | ||
function Adapt.adapt_storage(::Type{<:StaticArrays.SArray{S}},xs::Array) where S | ||
StaticArrays.SArray{S}(xs) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
using ArrayInterface, Test | ||
using Base: setindex | ||
import ArrayInterface: has_sparsestruct, findstructralnz, fast_scalar_indexing, lu_instance | ||
import ArrayInterface: has_sparsestruct, findstructralnz, fast_scalar_indexing, lu_instance, is_cpu_column_major, stridelayout | ||
@test ArrayInterface.ismutable(rand(3)) | ||
|
||
using StaticArrays | ||
|
@@ -188,3 +188,25 @@ end | |
@test isone(ArrayInterface.known_step(typeof(1:4))) | ||
end | ||
|
||
@testset "Memory Layout" begin | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add some tests with adjoints and transposes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a lot of tests with adjoints already. I'll replace one with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, I don't see any in this testset. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tests and line numbers featuring 200: @test device(view(A, 1, :, 2:4)') === ArrayInterface.CPUPointer()
210: @test @inferred(contiguous_axis(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.Contiguous(2)
213: @test @inferred(contiguous_axis(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.Contiguous(-1)
214: @test @inferred(contiguous_axis(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.Contiguous(1)
220: @test @inferred(ArrayInterface.contiguous_axis_indicator(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === (Val(false),Val(true))
223: @test @inferred(ArrayInterface.contiguous_axis_indicator(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === (Val(false),Val(false))
224: @test @inferred(ArrayInterface.contiguous_axis_indicator(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === (Val(true),Val(false))
230: @test @inferred(contiguous_batch_size(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.ContiguousBatch(0)
233: @test @inferred(contiguous_batch_size(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.ContiguousBatch(-1)
234: @test @inferred(contiguous_batch_size(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.ContiguousBatch(0)
240: @test @inferred(stride_rank(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.StrideRank((2, 1))
243: @test @inferred(stride_rank(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.StrideRank((2, 3))
244: @test @inferred(stride_rank(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.StrideRank((1, 3))
250: @test @inferred(dense_dims(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.DenseDims((false,true))
254: @test @inferred(dense_dims(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.DenseDims((false,false))
255: @test @inferred(dense_dims(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.DenseDims((true,false)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, I didn't look for apostrophes, sorry. |
||
A = rand(3,4,5) | ||
@test is_cpu_column_major(A) | ||
@test !is_cpu_column_major((1,2,3)) | ||
@test is_cpu_column_major(view(A, 1, :, 2:4)) | ||
@test !is_cpu_column_major(view(A, 1, :, 2:4)') | ||
@test !is_cpu_column_major(@SArray(rand(2,2,2))) | ||
@test is_cpu_column_major(@MArray(rand(2,2,2))) | ||
|
||
@test stridelayout(@SArray(rand(2,2,2))) == (1, 1, Base.OneTo(3)) | ||
@test stridelayout(A) == (1, 1, Base.OneTo(3)) | ||
@test stridelayout(PermutedDimsArray(A,(3,1,2))) == (2, 1, (3, 1, 2)) | ||
@test stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])) == (1, 1, (1, 2)) | ||
@test stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,1:2,:])) == (2, 1, (3, 1, 2)) | ||
@test stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])) == (-1, 1, (2, 1)) | ||
|
||
B = Array{Int8}(undef, 2,2,2,2); | ||
doubleperm = PermutedDimsArray(PermutedDimsArray(B,(4,2,3,1)), (4,2,1,3)); | ||
@test collect(strides(B))[collect(last(stridelayout(doubleperm)))] == collect(strides(doubleperm)) | ||
end | ||
|
||
@test ArrayInterface.canavx(ArrayInterface.canavx) == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to consider accommodating other memory layouts that have been used in improving performance of array operations like these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisRackauckas
"cpu" because I wanted it to be
false
for GPUArrays.Given that AbstractDeviceArrays <: DenseArray to opt into the
StrideArray
interface (even thoughLinearAlgebra
routines presumably don't work), I wanted to make it clear they shouldn't opt into this.@Tokazama
The
stridelayout
function can efficiently handle row-major or increasing vs decreasing strides, e.g.:But perhaps I should add density. Knowing that certain dims are dense could potentially allow subsets of loops to undergo
CartesianIndexing
->LinearIndexing
style loop fusion, even if on the whole theIndexStyle
isIndexCartesian()
.Or, as a correction for
Maybe all arrays are transposed (and thus
IndexCartesian()
), but if they're still dense, we could use linear indexing anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was specifically thinking of dense arrays but didn't want to chance overlooking something else. Using IndexStyle to accomplish this is pretty clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added density info on a per-stride basis.
Unfortunately, we can't tell from type info whether a slice of an array was partial or complete:
as
Base.Slice{Base.OneTo{Int64}}
is the type of a:
slice. This means we have to be very conservative with baseSubArray
s, but packages (like PaddedMatrices.jl) might want to provide more compile time info for their types.The finer granularity would allow for the partial linear-vs-cartesian indexing.
Other reasons to have this information:
vec
implementation is a little goofy.r*
registers). Many of these registers are used for holding array strides, and integer multiples of these strides. E.g., with a AVX512 column-major matmul kernel forC = A * B
, it wants to holdstride(A,2), 3stride(A,2), 5stride(A,2), stride(C,2), 3stride(C,2), 5stride(C,2)
. So if we know thatstride(A,2) == stride(C,2)
, we could free 3 registers, save a couple calculations, and a bunch of integer load/stores from the stack when we get less spills. I'm not sure how to implement this optimization / let a user guarantee that they aren't writing 100% valid code likeSo knowing that
A
andC
are both dense still isn't enough to guarantee that the strides are equal.Not particularly relevant to this issue, but I wonder if there'd be a good way to communicate size equality. Users will often check for equal sizes in front of a loop, and
eachindex
will guarantee this as wellBecause
eachindex
isn't always appropriate (e.g., when you don't have exact correspondence between axes of the arrays), I think it'd be a good to have something like aneachindexalongaxis
:where it checks sizes.
eachindexalongaxis
oreach_index_along_axis
doesn't really roll off the tongue, and I don't know where such a function should live.I'd also have to think a little more about how to use it and actually pass the information of dimension equality (to be combined with density guarantees) to deduce stride-equality. The hacky way would be to just pattern match
eachindex
andeachindexalongindex
in the macro, handling dimension equality through the expression, while handling density info based on types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a separate issue about the axes specific indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only think I have left to say about this one is that you may want to include a some explanation or a link so people understand the "cpu" part. It's obviously not a huge deal, but I anticipate it becoming a common question that becomes irksome to continually explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a comment simply saying that it should return
false
for any sort of GPUArray?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work.