-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlockMap enhancements #72
Conversation
Codecov Report
@@ Coverage Diff @@
## master #72 +/- ##
==========================================
- Coverage 96.67% 96.48% -0.19%
==========================================
Files 10 10
Lines 631 655 +24
==========================================
+ Hits 610 632 +22
- Misses 21 23 +2
Continue to review full report at Codecov.
|
require_one_based_indexing(y, x) | ||
m, n = size(A) | ||
@boundscheck (m == length(y) && n == length(x)) || throw(DimensionMismatch("A_mul_B!")) | ||
@inline function _blockmul!(y, A::BlockMap, x, α, β) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be the A_mul_B!
code, which can be easily generalized to work with α
, β
, and matrices instead of vectors, so I factored it out. The generic version of indexing is then selectdim(y, 1, ...)
.
require_one_based_indexing(y, x) | ||
m, n = size(A) | ||
@boundscheck (n == length(y) && m == length(x)) || throw(DimensionMismatch("At_mul_B!")) | ||
@inline function _transblockmul!(y, A::BlockMap, x, α, β, transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is corresponding multiplication code we used to have twice, once for At_mul_B!
and once for Ac_mul_B!
. Otherwise, this is the analogous generalization to alpha, beta, and matrices.
Base.@propagate_inbounds A_mul_B!(y::AbstractVector, A::BlockMap, x::AbstractVector) = | ||
mul!(y, A, x) | ||
|
||
Base.@propagate_inbounds A_mul_B!(y::AbstractVector, A::TransposeMap{<:Any,<:BlockMap}, x::AbstractVector) = | ||
mul!(y, A, x) | ||
|
||
Base.@propagate_inbounds At_mul_B!(y::AbstractVector, A::BlockMap, x::AbstractVector) = | ||
mul!(y, transpose(A), x) | ||
|
||
Base.@propagate_inbounds A_mul_B!(y::AbstractVector, A::AdjointMap{<:Any,<:BlockMap}, x::AbstractVector) = | ||
mul!(y, A, x) | ||
|
||
Base.@propagate_inbounds Ac_mul_B!(y::AbstractVector, A::BlockMap, x::AbstractVector) = | ||
mul!(y, adjoint(A), x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have multiplication handled by mul!
s, and nothing else.
m = 5; n = 6 | ||
M1 = 10*(1:m) .+ (1:(n+1))'; L1 = LinearMap(M1) | ||
M2 = randn(elty, m, n+2); L2 = LinearMap(M2) | ||
M3 = randn(elty, m, n+3); L3 = LinearMap(M3) | ||
|
||
# Md = diag(M1, M2, M3, M2, M1) # unsupported so use sparse: | ||
Md = Matrix(blockdiag(sparse.((M1, M2, M3, M2, M1))...)) | ||
x = randn(elty, size(Md, 2)) | ||
Bd = @inferred blockdiag(L1, L2, L3, L2, L1) | ||
@test Matrix(@inferred blockdiag(L1)) == M1 | ||
@test Matrix(@inferred blockdiag(L1, L2)) == blockdiag(sparse.((M1, M2))...) | ||
Bd2 = @inferred cat(L1, L2, L3, L2, L1; dims=(1,2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should say that these tests are shamelessly stolen from @JeffFessler's PR #65. 😉
With the uniform scaling changes in the Kronecker PR, this makes (5-arg) multiplication of BlockMaps built from AbstractMatrix and UniformScaling objects with vectors and even with matrices essentially allocation-free, up to the generation of views. I read somewhere that taking views might be handled by the compiler in an allocation-free way by Julia v1.4! This is ready for review. I guess once #61 and this one are merged, we should release and let people enjoy. 😃 |
Hm, at some point we should experiment with the multi-threading features. I did some benchmarking, adopting benchmark tests from JuliaArrays/BlockDiagonals.jl#26 (comment). EDIT: I should say that, over at using LinearAlgebra, LinearMaps, BlockDiagonals, BenchmarkTools, Test, SparseArrays
for nblocks in (2, 15, 20, 200)
@show nblocks
As = [rand(10, 10) for _ in 1:nblocks]
L = cat(LinearMap.(As)...; dims=(1,2))
B = BlockDiagonal(As)
A = rand(size(B)...)
Y = zeros(size(B))
M = Matrix(B)
D = Diagonal(As)
A´ = [rand(10, nblocks*10) for _ in 1:nblocks]
println("BlockDiag * Matrix")
@btime @inbounds mul!($Y, $L, $A)
@btime hcat($D * $A´)
@btime $B * $A
@btime $M * $A
end The results for the PR as is are: nblocks = 2
BlockDiag * Matrix
1.222 μs (4 allocations: 256 bytes)
1.089 μs (7 allocations: 3.80 KiB)
3.955 μs (24 allocations: 10.91 KiB)
1.065 μs (1 allocation: 3.25 KiB)
nblocks = 15
BlockDiag * Matrix
55.998 μs (30 allocations: 1.88 KiB)
43.004 μs (20 allocations: 178.61 KiB)
96.291 μs (142 allocations: 535.28 KiB)
73.585 μs (2 allocations: 175.89 KiB)
nblocks = 20
BlockDiag * Matrix
108.166 μs (40 allocations: 2.50 KiB)
74.857 μs (25 allocations: 315.55 KiB)
164.571 μs (187 allocations: 946.69 KiB)
143.115 μs (2 allocations: 312.58 KiB)
nblocks = 200
BlockDiag * Matrix
19.944 ms (400 allocations: 25.00 KiB)
7.109 ms (405 allocations: 30.54 MiB)
23.854 ms (2504 allocations: 91.63 MiB)
111.515 ms (2 allocations: 30.52 MiB) Note the small number of allocations, exactly two per block, corresponding to the views into nblocks = 2
BlockDiag * Matrix
16.870 μs (58 allocations: 3.72 KiB)
1.093 μs (7 allocations: 3.80 KiB)
3.702 μs (24 allocations: 10.91 KiB)
1.078 μs (1 allocation: 3.25 KiB)
3.699 μs (1 allocation: 3.25 KiB)
nblocks = 15
BlockDiag * Matrix
45.045 μs (240 allocations: 9.20 KiB)
42.618 μs (20 allocations: 178.61 KiB)
95.217 μs (142 allocations: 535.28 KiB)
73.985 μs (2 allocations: 175.89 KiB)
191.856 μs (2 allocations: 175.89 KiB)
nblocks = 20
BlockDiag * Matrix
63.837 μs (310 allocations: 11.31 KiB)
75.688 μs (25 allocations: 315.55 KiB)
165.250 μs (187 allocations: 946.69 KiB)
144.712 μs (2 allocations: 312.58 KiB)
340.868 μs (2 allocations: 312.58 KiB)
nblocks = 200
BlockDiag * Matrix
9.702 ms (3430 allocations: 96.63 KiB)
7.124 ms (405 allocations: 30.54 MiB)
23.140 ms (2504 allocations: 91.63 MiB)
110.130 ms (2 allocations: 30.52 MiB)
39.654 ms (2 allocations: 30.52 MiB) This is with 4 threads. So we are able to kind of catch up with the fast Another aspect that I realized: when there are 16 or more LinearMaps involved, then the |
Would be great if we could get this one in, as well. The README already promises a new minor release. 😂 |
This is one ready for review. |
@Jutho If you find the time, can you take a look? On the other hand, there is little "controversial" stuff in here and I have reviewed it myself many times 😂 , so I'd merge soon. |
mul!
forBlockMap
sBase.@propagate_inbounds
inBlockMap
-multiplicationBlockDiagonalMap
mul!(::AbstractMatrix, ::BlockMap, ::AbstractMatrix, alpha, beta)