Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added some methods to help support LoopVectorization.jl #61

Merged
merged 53 commits into from
Sep 19, 2020
Merged
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
9a41d08
Added some methods to help support LoopVectorization.jl
chriselrod Aug 11, 2020
597606b
Add a little more to doc on `batch` return of `stridelayout`.
chriselrod Aug 11, 2020
2513045
canavx -> can_avx, add&fix missing stridelayout test, make strideorde…
chriselrod Aug 12, 2020
791cffa
Merge branch 'master' into loopvecsupport
chriselrod Aug 12, 2020
6c36e69
Update tests for stridelayout always using tuples for rank.
chriselrod Aug 12, 2020
8af5986
Add density information per axis.
chriselrod Aug 12, 2020
87e8bd1
Add `axesdense` to stridelayout's returns.
chriselrod Aug 15, 2020
53929ad
Merge branch 'master' into loopvecsupport
chriselrod Aug 18, 2020
572fccc
Add sentinal values to batch parameter definition.
chriselrod Aug 18, 2020
874fee8
Split apart definitions.
chriselrod Aug 20, 2020
358a93c
Updated README.
chriselrod Aug 20, 2020
0ca96b8
More tests, simplify `Device`
chriselrod Aug 20, 2020
9b09212
Add contiguous_batch_size docs.
chriselrod Aug 20, 2020
3417ba0
Better stride_rank.
chriselrod Aug 20, 2020
944fd93
Terrible cumsum(::Tuple) implementation for pre-1.5.
chriselrod Aug 20, 2020
a8567db
Device -> device changes, differentiate between CPUPointer and CPUInd…
chriselrod Aug 22, 2020
bd55e33
Make `stride_rank(::Type{T})`(no dim argument) return a parametrized …
chriselrod Aug 22, 2020
0862eb6
Added hybrid dynamic-static tuple type, and size and strides function…
chriselrod Aug 23, 2020
a3ac561
Added `nothing` fallbacks for sdsize and sdstrides.
chriselrod Aug 23, 2020
2f92c61
Use tuple type instead of value tuples in SDTuple, to facilitate use …
chriselrod Aug 26, 2020
9778d47
unwrap -> _get
chriselrod Aug 28, 2020
ab02af4
Use Static for partially static tuples.
chriselrod Sep 1, 2020
7cfd029
Merge branch 'master' into loopvecsupport
chriselrod Sep 1, 2020
39f414b
Make indices accept heterogenous tuples, and ntuple(f, ::Static{}) fu…
chriselrod Sep 1, 2020
9f852da
Add OffsetArray support.
chriselrod Sep 3, 2020
94eb2a7
Better default sdoffsets.
chriselrod Sep 3, 2020
973a01c
Delete `Static` ntuple.
chriselrod Sep 4, 2020
6e9207b
Better sdoffsets.
chriselrod Sep 4, 2020
00d0a00
Added sdoffset tests and made inference improvements.
chriselrod Sep 6, 2020
5120651
Add more Static methods to avoid ambiguities.
chriselrod Sep 7, 2020
d6130f8
Update src/stridelayout.jl
chriselrod Sep 8, 2020
a88c692
Update README.md
chriselrod Sep 8, 2020
910a42a
Update README.md
chriselrod Sep 8, 2020
6ad4ec5
Added docstrings for sd(size/strides/offsets), made `OptionmallyStati…
chriselrod Sep 8, 2020
28404dc
More indices tests.
chriselrod Sep 8, 2020
1f08ae0
Added some `Static` comparison methods to improve type stability.
chriselrod Sep 8, 2020
00c9112
Merged `static.jl`
chriselrod Sep 9, 2020
5f1f14f
Merge branch 'static' into loopvecsupport
chriselrod Sep 9, 2020
dd7b0ed
Remove extra parameter.
chriselrod Sep 9, 2020
726691e
Merged.
chriselrod Sep 9, 2020
895f9b7
Fix _sdsize (generated functions should rely on methods others will e…
chriselrod Sep 14, 2020
a59e05a
_try_static in _sdsize
chriselrod Sep 14, 2020
7d17247
propagate_inbounds in ranges.jl
chriselrod Sep 14, 2020
18b954b
Finish merging master.
chriselrod Sep 14, 2020
09515fe
Move docstring for rank_to_sortperm.
chriselrod Sep 14, 2020
0ef7653
Drop `sd` prefix to size/strides/offsets.
chriselrod Sep 15, 2020
33b8433
Add indexing size/stride methods.
chriselrod Sep 15, 2020
efe40e1
add where and tests.
chriselrod Sep 15, 2020
e72cfc8
Set generic fallback size and strides to equal base methods.
chriselrod Sep 15, 2020
42258b5
Remove Base.Val(::Static{N}) definition to reduce invalidations.
chriselrod Sep 16, 2020
090dab9
Static -> StaticInt.
chriselrod Sep 18, 2020
ac8033a
Update README for sd -> ArrayInterface. and Static -> StaticInt
chriselrod Sep 18, 2020
21f7d48
More README fixes.
chriselrod Sep 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -112,6 +112,21 @@ If `step` of instances of type `T` are known at compile time, return that step.
Otherwise, returns `nothing`. For example, `known_step(UnitRange{Int})` returns
`one(Int)`.

## is_cpu_column_major(::Type{T})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cpu?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to consider accommodating other memory layouts that have been used in improving performance of array operations like these?

Copy link
Collaborator Author

@chriselrod chriselrod Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisRackauckas
"cpu" because I wanted it to be false for GPUArrays.
Given that AbstractDeviceArrays <: DenseArray to opt into the StrideArray interface (even though LinearAlgebra routines presumably don't work), I wanted to make it clear they shouldn't opt into this.

@Tokazama
The stridelayout function can efficiently handle row-major or increasing vs decreasing strides, e.g.:

julia> using ArrayInterface: stridelayout

julia> is_row_major(x) = stridelayout(x) === (2,1,(2,1))
 is_row_major (generic function with 1 method)

julia> B = rand(3,4);

julia> @code_typed is_row_major(B')
 CodeInfo(
1return true
) => Bool

julia> @code_typed is_row_major(B)
 CodeInfo(
1return false
) => Bool

But perhaps I should add density. Knowing that certain dims are dense could potentially allow subsets of loops to undergo CartesianIndexing -> LinearIndexing style loop fusion, even if on the whole the IndexStyle is IndexCartesian().
Or, as a correction for

julia> IndexStyle(typeof(B))
 IndexLinear()

julia> IndexStyle(typeof(B'))
 IndexCartesian()

Maybe all arrays are transposed (and thus IndexCartesian()), but if they're still dense, we could use linear indexing anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was specifically thinking of dense arrays but didn't want to chance overlooking something else. Using IndexStyle to accomplish this is pretty clever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added density info on a per-stride basis.

julia> A = rand(3,4,5);

julia> stridelayout(PermutedDimsArray(A,(3,1,2)))
 (2, 1, (3, 1, 2), (true, true, true))

julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:]))
 (1, 1, (1, 2), (true, false))

julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,1:2,:]))
 (2, 1, (3, 1, 2), (false, false, true))

Unfortunately, we can't tell from type info whether a slice of an array was partial or complete:

julia> s2 = Base.Slice(Base.OneTo(2))
 Base.Slice(Base.OneTo(2))

julia> view(A,s2,s2,s2)
 2×2×2 view(::Array{Float64,3}, :, :, :) with eltype Float64:
[:, :, 1] =
 0.329658  0.543774
 0.350255  0.0817058

[:, :, 2] =
 0.310211  0.126501
 0.587191  0.884039

as Base.Slice{Base.OneTo{Int64}} is the type of a : slice. This means we have to be very conservative with base SubArrays, but packages (like PaddedMatrices.jl) might want to provide more compile time info for their types.

The finer granularity would allow for the partial linear-vs-cartesian indexing.
Other reasons to have this information:

  • using linear indexing to decide what to do for something like a vec implementation is a little goofy.
  • LoopVectorization often spills scalar integer registers (i.e., r* registers). Many of these registers are used for holding array strides, and integer multiples of these strides. E.g., with a AVX512 column-major matmul kernel for C = A * B, it wants to hold stride(A,2), 3stride(A,2), 5stride(A,2), stride(C,2), 3stride(C,2), 5stride(C,2). So if we know that stride(A,2) == stride(C,2), we could free 3 registers, save a couple calculations, and a bunch of integer load/stores from the stack when we get less spills. I'm not sure how to implement this optimization / let a user guarantee that they aren't writing 100% valid code like
A = rand(4,5);
C = rand(40,50);
for m in axes(A,1), n in axes(A,2)
    C[m,n] = 2A[m,n]
end

So knowing that A and C are both dense still isn't enough to guarantee that the strides are equal.

Not particularly relevant to this issue, but I wonder if there'd be a good way to communicate size equality. Users will often check for equal sizes in front of a loop, and eachindex will guarantee this as well

for I in eachindex(A,C) # checks for equivalent size
   
end

Because eachindex isn't always appropriate (e.g., when you don't have exact correspondence between axes of the arrays), I think it'd be a good to have something like an eachindexalongaxis:

function mymatmulkernel!(C, A, B)
    @avx for m in eachindexalongaxis((C,A),(1,1)), n in eachindexalongaxis((C,B),(2,2)), k in eachindexalongaxis((A,B),(2,1))
        C[m,n] += A[m,k] * B[k,n]
    end
end

where it checks sizes. eachindexalongaxis or each_index_along_axis doesn't really roll off the tongue, and I don't know where such a function should live.
I'd also have to think a little more about how to use it and actually pass the information of dimension equality (to be combined with density guarantees) to deduce stride-equality. The hacky way would be to just pattern match eachindex and eachindexalongindex in the macro, handling dimension equality through the expression, while handling density info based on types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a separate issue about the axes specific indexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only think I have left to say about this one is that you may want to include a some explanation or a link so people understand the "cpu" part. It's obviously not a huge deal, but I anticipate it becoming a common question that becomes irksome to continually explain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a comment simply saying that it should return false for any sort of GPUArray?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work.


Returns `true` if instances of type `T` have a strided column-major memory layout and support
`pointer`. Returns `false` otherwise.

## stridelayout(::Type{T})

Returns a 3-tuple describing the strided layout of the memory of an instance of type `T` if it
is known, and returning `nothing` otherwise.
The elements of the tuple include
- `contig`: The axis with contiguous elements. `contig == -1` indicates no axis is contiguous. `striderank[contig]` does not necessarilly equal `1`.
- `batch`: indicates the number of contiguous elements.
- `striderank` indicates the rank of the given stride with respect to the others. If for `A::T` we have `striderank[i] > striderank[j]`, then `stride(A,i) > stride(A,j)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add can_avx?



# List of things to add

- https://github.com/JuliaLang/julia/issues/22216
138 changes: 138 additions & 0 deletions src/ArrayInterface.jl
Original file line number Diff line number Diff line change
@@ -530,6 +530,139 @@ known_step(x) = known_step(typeof(x))
known_step(::Type{T}) where {T} = nothing
known_step(::Type{<:AbstractUnitRange{T}}) where {T} = one(T)

"""
is_cpu_column_major(::Type{T})
Does an Array of type `T` point to column major memory in the cpu's address space?
If `is_cpu_column_major(typeof(A))` return `true` and the element type is a primite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is primite type suppose to be primitive

type, then the array should be compatible with `LoopVectorization.jl` as well as
`C` and `Fortran` programs requiring pointers and assuming column major memory layout.
If `is_cpu_column_major(typeof(A))` return `true`, the array supports the
[Strided Array](https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-strided-arrays-1) interface.
"""
is_cpu_column_major(x) = is_cpu_column_major(typeof(x))
is_cpu_column_major(::Type) = false
is_cpu_column_major(::Type{<:Array}) = true
is_cpu_column_major(::Type{S}) where {A, S <: SubArray{<:Any,<:Any,A,<:Tuple{Vararg{Union{Int,<:AbstractRange}}}}} = is_cpu_column_major(A)

"""
stridelayout(::Type{T}) -> (contig, batch, striderank)
Descrive the memory layout of a strided container of type `T`. If unknown or not strided, returns `nothing`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Descrive -> Describe

Else, it returns a tuple with elements:
- `contig`: The axis with contiguous elements. `contig == -1` indicates no axis is contiguous. `striderank[contig]` does not necessarilly equal `1`.
- `batch`: indicates the number of contiguous elements.
- `striderank` indicates the rank of the given stride with respect to the others. If for `A::T` we have `striderank[i] > striderank[j]`, then `stride(A,i) > stride(A,j)`.
The convenience method
```julia
stridelayout(x) = stridelayout(typeof(x))
```
is also provided.
```julia
julia> A = rand(3,4,5);
julia> stridelayout(A)
(1, 1, Base.OneTo(3))
julia> stridelayout(PermutedDimsArray(A,(3,1,2)))
(2, 1, (3, 1, 2))
julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:]))
(1, 1, (1, 2))
julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,1:2,:]))
(2, 1, (3, 1, 2))
julia> stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:]))
(-1, 1, (2, 1))
```
"""
stridelayout(x) = stridelayout(typeof(x))
stridelayout(::Type) = nothing
stridelayout(::Type{Array{T,N}}) where {T,N} = (1,1,Base.OneTo(N))
stridelayout(::Type{<:Tuple}) = (1,1,Base.OneTo(1))
function stridelayout(::Type{<:Union{Transpose{T,A},Adjoint{T,A}}}) where {T,A<:AbstractMatrix{T}}
ml = stridelayout(A)
isnothing(ml) && return nothing
contig, batch, rank = ml
new_rank = (rank[2], rank[1])
new_contig = congig == -1 ? -1 : 3 - contig
new_contig, batch, new_rank
end
function stridelayout(::Type{<:PermutedDimsArray{T,N,I1,I2,A}}) where {T,N,I1,I2,A<:AbstractArray{T,N}}
ml = stridelayout(A)
isnothing(ml) && return nothing
contig, batch, rank = ml
new_contig = I2[contig]
new_rank = ntuple(n -> rank[I1[n]], Val(N))
new_contig, batch, new_rank
end
@generated function stridelayout(::Type{S}) where {N,NP,T,A<:AbstractArray{T,NP},I,S <: SubArray{T,N,A,I}}
ml = stridelayout(A)
isnothing(ml) && return nothing
contig, batch, rank = ml
rankv = collect(rank)
rank_new = Int[]
n = 0
new_contig = contig
for np in 1:NP
r = rankv[np]
if I.parameters[np] <: AbstractUnitRange
n += 1
push!(rank_new, r)
if np == contig
new_contig = n
end
else
# There's definitely a smarter way to do this.
# When we drop a rank, we lower the others.
for nᵢ 1:n
rᵢ = rank_new[nᵢ]
if rᵢ > r
rank_new[nᵢ] = rᵢ - 1
end
end
for npᵢ np+1:NP
rᵢ = rankv[npᵢ]
if rᵢ > r
rankv[npᵢ] = rᵢ - 1
end
end
if np == contig
new_contig = -1
end
end
end
# If n != N, then an axis was indeced by something other than an integer or `AbstractUnitRange`, so we return `nothing`
n == N || return nothing
ranktup = Expr(:tuple); append!(ranktup.args, rank_new) # dynamic splats bad
Expr(:tuple, new_contig, batch, ranktup)
end

"""
canavx(f)
Returns `true` if the function `f` is guaranteed to be compatible with `LoopVectorization.@avx` for supported element and array types.
While a return value of `false` does not indicate the function isn't supported, this allows a library to conservatively apply `@avx`
only when it is known to be safe to do so.
```julia
function mymap!(f, y, args...)
if canavx(f)
@avx @. y = f(args...)
else
@. y = f(args...)
end
end
```
"""
canavx(::Any) = false


function __init__()

@require SuiteSparse="4607b0f0-06f3-5cda-b6b1-a6196a1729e9" begin
@@ -544,6 +677,7 @@ function __init__()
ismutable(::Type{<:StaticArrays.StaticArray}) = false
can_setindex(::Type{<:StaticArrays.StaticArray}) = false
ismutable(::Type{<:StaticArrays.MArray}) = true
ismutable(::Type{<:StaticArrays.SizedArray}) = true

function lu_instance(_A::StaticArrays.StaticMatrix{N,N}) where {N}
A = StaticArrays.SArray(_A)
@@ -564,6 +698,10 @@ function __init__()
known_first(::Type{<:StaticArrays.SOneTo}) = 1
known_last(::Type{StaticArrays.SOneTo{N}}) where {N} = N

is_cpu_column_major(::Type{<:StaticArrays.MArray}) = true
# is_cpu_column_major(::Type{<:StaticArrays.SizedArray}) = false # Why?
stridelayout(::Type{<:StaticArrays.StaticArray{S,T,N}}) where {S,T,N} = (1,1,Base.OneTo(N))

@require Adapt="79e6a3ab-5dfb-504d-930d-738a2a938a0e" begin
function Adapt.adapt_storage(::Type{<:StaticArrays.SArray{S}},xs::Array) where S
StaticArrays.SArray{S}(xs)
24 changes: 23 additions & 1 deletion test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
using ArrayInterface, Test
using Base: setindex
import ArrayInterface: has_sparsestruct, findstructralnz, fast_scalar_indexing, lu_instance
import ArrayInterface: has_sparsestruct, findstructralnz, fast_scalar_indexing, lu_instance, is_cpu_column_major, stridelayout
@test ArrayInterface.ismutable(rand(3))

using StaticArrays
@@ -188,3 +188,25 @@ end
@test isone(ArrayInterface.known_step(typeof(1:4)))
end

@testset "Memory Layout" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests with adjoints and transposes?

Copy link
Collaborator Author

@chriselrod chriselrod Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of tests with adjoints already. I'll replace one with transpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't see any in this testset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests and line numbers featuring ' within the "Memory Layout" testset:

200:    @test device(view(A, 1, :, 2:4)') === ArrayInterface.CPUPointer()
210:    @test @inferred(contiguous_axis(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.Contiguous(2)
213:    @test @inferred(contiguous_axis(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.Contiguous(-1)
214:    @test @inferred(contiguous_axis(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.Contiguous(1)
220:    @test @inferred(ArrayInterface.contiguous_axis_indicator(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === (Val(false),Val(true))
223:    @test @inferred(ArrayInterface.contiguous_axis_indicator(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === (Val(false),Val(false))
224:    @test @inferred(ArrayInterface.contiguous_axis_indicator(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === (Val(true),Val(false))
230:    @test @inferred(contiguous_batch_size(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.ContiguousBatch(0)
233:    @test @inferred(contiguous_batch_size(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.ContiguousBatch(-1)
234:    @test @inferred(contiguous_batch_size(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.ContiguousBatch(0)
240:    @test @inferred(stride_rank(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.StrideRank((2, 1))
243:    @test @inferred(stride_rank(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.StrideRank((2, 3))
244:    @test @inferred(stride_rank(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.StrideRank((1, 3))
250:    @test @inferred(dense_dims(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])')) === ArrayInterface.DenseDims((false,true))
254:    @test @inferred(dense_dims(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])')) === ArrayInterface.DenseDims((false,false))
255:    @test @inferred(dense_dims(@view(PermutedDimsArray(A,(3,1,2))[:,1:2,1])')) === ArrayInterface.DenseDims((true,false))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't look for apostrophes, sorry.

A = rand(3,4,5)
@test is_cpu_column_major(A)
@test !is_cpu_column_major((1,2,3))
@test is_cpu_column_major(view(A, 1, :, 2:4))
@test !is_cpu_column_major(view(A, 1, :, 2:4)')
@test !is_cpu_column_major(@SArray(rand(2,2,2)))
@test is_cpu_column_major(@MArray(rand(2,2,2)))

@test stridelayout(@SArray(rand(2,2,2))) == (1, 1, Base.OneTo(3))
@test stridelayout(A) == (1, 1, Base.OneTo(3))
@test stridelayout(PermutedDimsArray(A,(3,1,2))) == (2, 1, (3, 1, 2))
@test stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2,1:2,:])) == (1, 1, (1, 2))
@test stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,1:2,:])) == (2, 1, (3, 1, 2))
@test stridelayout(@view(PermutedDimsArray(A,(3,1,2))[2:3,2,:])) == (-1, 1, (2, 1))

B = Array{Int8}(undef, 2,2,2,2);
doubleperm = PermutedDimsArray(PermutedDimsArray(B,(4,2,3,1)), (4,2,1,3));
@test collect(strides(B))[collect(last(stridelayout(doubleperm)))] == collect(strides(doubleperm))
end

@test ArrayInterface.canavx(ArrayInterface.canavx) == false