BlockMap enhancements #72

dkarrasch · 2019-10-16T12:15:08Z

add 5-arg mul! for BlockMaps
use Base.@propagate_inbounds in BlockMap-multiplication
add overhead-free BlockDiagonalMap
add 5-arg mul!(::AbstractMatrix, ::BlockMap, ::AbstractMatrix, alpha, beta)

coveralls · 2019-10-16T12:54:58Z

Coverage increased (+1.8%) to 95.972% when pulling 0857943 on blockdiag into a0f5016 on master.

codecov · 2019-10-16T14:33:26Z

Codecov Report

Merging #72 into master will decrease coverage by 0.18%.
The diff coverage is 96.25%.

@@            Coverage Diff             @@
##           master      #72      +/-   ##
==========================================
- Coverage   96.67%   96.48%   -0.19%     
==========================================
  Files          10       10              
  Lines         631      655      +24     
==========================================
+ Hits          610      632      +22     
- Misses         21       23       +2

Impacted Files	Coverage Δ
src/LinearMaps.jl	`89.39% <100%> (+1.7%)`	⬆️
src/wrappedmap.jl	`100% <100%> (ø)`	⬆️
src/blockmap.jl	`96.98% <95.58%> (-1.61%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0f5016...0857943. Read the comment docs.

src/LinearMaps.jl

dkarrasch · 2019-10-22T06:58:03Z

src/blockmap.jl

-    require_one_based_indexing(y, x)
-    m, n = size(A)
-    @boundscheck (m == length(y) && n == length(x)) || throw(DimensionMismatch("A_mul_B!"))
+@inline function _blockmul!(y, A::BlockMap, x, α, β)


This used to be the A_mul_B! code, which can be easily generalized to work with α, β, and matrices instead of vectors, so I factored it out. The generic version of indexing is then selectdim(y, 1, ...).

dkarrasch · 2019-10-22T06:59:08Z

src/blockmap.jl

-    require_one_based_indexing(y, x)
-    m, n = size(A)
-    @boundscheck (n == length(y) && m == length(x)) || throw(DimensionMismatch("At_mul_B!"))
+@inline function _transblockmul!(y, A::BlockMap, x, α, β, transform)


This is corresponding multiplication code we used to have twice, once for At_mul_B! and once for Ac_mul_B!. Otherwise, this is the analogous generalization to alpha, beta, and matrices.

dkarrasch · 2019-10-22T07:00:07Z

src/blockmap.jl

+Base.@propagate_inbounds A_mul_B!(y::AbstractVector, A::BlockMap, x::AbstractVector) =
+    mul!(y, A, x)
+
+Base.@propagate_inbounds A_mul_B!(y::AbstractVector, A::TransposeMap{<:Any,<:BlockMap}, x::AbstractVector) =
+    mul!(y, A, x)
+
+Base.@propagate_inbounds At_mul_B!(y::AbstractVector, A::BlockMap, x::AbstractVector) =
+    mul!(y, transpose(A), x)
+
+Base.@propagate_inbounds A_mul_B!(y::AbstractVector, A::AdjointMap{<:Any,<:BlockMap}, x::AbstractVector) =
+    mul!(y, A, x)
+
+Base.@propagate_inbounds Ac_mul_B!(y::AbstractVector, A::BlockMap, x::AbstractVector) =
+    mul!(y, adjoint(A), x)


Have multiplication handled by mul!s, and nothing else.

dkarrasch · 2019-10-22T07:04:38Z

test/blockmap.jl

+            m = 5; n = 6
+            M1 = 10*(1:m) .+ (1:(n+1))'; L1 = LinearMap(M1)
+            M2 = randn(elty, m, n+2); L2 = LinearMap(M2)
+            M3 = randn(elty, m, n+3); L3 = LinearMap(M3)
+
+            # Md = diag(M1, M2, M3, M2, M1) # unsupported so use sparse:
+            Md = Matrix(blockdiag(sparse.((M1, M2, M3, M2, M1))...))
+            x = randn(elty, size(Md, 2))
+            Bd = @inferred blockdiag(L1, L2, L3, L2, L1)
+            @test Matrix(@inferred blockdiag(L1)) == M1
+            @test Matrix(@inferred blockdiag(L1, L2)) == blockdiag(sparse.((M1, M2))...)
+            Bd2 = @inferred cat(L1, L2, L3, L2, L1; dims=(1,2))


I should say that these tests are shamelessly stolen from @JeffFessler's PR #65. 😉

dkarrasch · 2019-10-22T07:10:53Z

With the uniform scaling changes in the Kronecker PR, this makes (5-arg) multiplication of BlockMaps built from AbstractMatrix and UniformScaling objects with vectors and even with matrices essentially allocation-free, up to the generation of views. I read somewhere that taking views might be handled by the compiler in an allocation-free way by Julia v1.4!

This is ready for review. I guess once #61 and this one are merged, we should release and let people enjoy. 😃

dkarrasch · 2019-10-22T12:08:27Z

Hm, at some point we should experiment with the multi-threading features. I did some benchmarking, adopting benchmark tests from JuliaArrays/BlockDiagonals.jl#26 (comment). EDIT: I should say that, over at BlockDiagonals.jl, there is recent effort to speed-up multiplication in JuliaArrays/BlockDiagonals.jl#26, so their runtimes below are not final.

using LinearAlgebra, LinearMaps, BlockDiagonals, BenchmarkTools, Test, SparseArrays

for nblocks in (2, 15, 20, 200)
    @show nblocks
    As = [rand(10, 10) for _ in 1:nblocks]
    L = cat(LinearMap.(As)...; dims=(1,2))
    B = BlockDiagonal(As)
    A = rand(size(B)...)
    Y = zeros(size(B))
    M = Matrix(B)
    D = Diagonal(As)
    A´ = [rand(10, nblocks*10) for _ in 1:nblocks]

    println("BlockDiag * Matrix")
    @btime @inbounds mul!($Y, $L, $A)
    @btime hcat($D * $A´)
    @btime $B * $A
    @btime $M * $A
end

The results for the PR as is are:

nblocks = 2
BlockDiag * Matrix
  1.222 μs (4 allocations: 256 bytes)
  1.089 μs (7 allocations: 3.80 KiB)
  3.955 μs (24 allocations: 10.91 KiB)
  1.065 μs (1 allocation: 3.25 KiB)
nblocks = 15
BlockDiag * Matrix
  55.998 μs (30 allocations: 1.88 KiB)
  43.004 μs (20 allocations: 178.61 KiB)
  96.291 μs (142 allocations: 535.28 KiB)
  73.585 μs (2 allocations: 175.89 KiB)
nblocks = 20
BlockDiag * Matrix
  108.166 μs (40 allocations: 2.50 KiB)
  74.857 μs (25 allocations: 315.55 KiB)
  164.571 μs (187 allocations: 946.69 KiB)
  143.115 μs (2 allocations: 312.58 KiB)
nblocks = 200
BlockDiag * Matrix
  19.944 ms (400 allocations: 25.00 KiB)
  7.109 ms (405 allocations: 30.54 MiB)
  23.854 ms (2504 allocations: 91.63 MiB)
  111.515 ms (2 allocations: 30.52 MiB)

Note the small number of allocations, exactly two per block, corresponding to the views into x and y, respectively. The clear winner (in terms of producing numbers; of course, the application case and the exact arrangement of the numbers is different) is LinearAlgebras Diagonal. Its multiplication is defined as simply D.diag .* v, so it uses broadcasting. I was then wondering why the hell this is so much faster, given that all we do is go through the blocks and do in-place mul! into the corresponding views. I suspected that broadcasting may use, at the very bottom, multi-threading (hm, I just tested, it's as fast when there is only one thread 🤔 ), so I thought I'd give it a try with multi-threading in LinearMaps.jl. And actually, it turns out bloody easy, simply add a Threads.@threads in front of the loop. Here are the results:

nblocks = 2
BlockDiag * Matrix
  16.870 μs (58 allocations: 3.72 KiB)
  1.093 μs (7 allocations: 3.80 KiB)
  3.702 μs (24 allocations: 10.91 KiB)
  1.078 μs (1 allocation: 3.25 KiB)
  3.699 μs (1 allocation: 3.25 KiB)
nblocks = 15
BlockDiag * Matrix
  45.045 μs (240 allocations: 9.20 KiB)
  42.618 μs (20 allocations: 178.61 KiB)
  95.217 μs (142 allocations: 535.28 KiB)
  73.985 μs (2 allocations: 175.89 KiB)
  191.856 μs (2 allocations: 175.89 KiB)
nblocks = 20
BlockDiag * Matrix
  63.837 μs (310 allocations: 11.31 KiB)
  75.688 μs (25 allocations: 315.55 KiB)
  165.250 μs (187 allocations: 946.69 KiB)
  144.712 μs (2 allocations: 312.58 KiB)
  340.868 μs (2 allocations: 312.58 KiB)
nblocks = 200
BlockDiag * Matrix
  9.702 ms (3430 allocations: 96.63 KiB)
  7.124 ms (405 allocations: 30.54 MiB)
  23.140 ms (2504 allocations: 91.63 MiB)
  110.130 ms (2 allocations: 30.52 MiB)
  39.654 ms (2 allocations: 30.52 MiB)

This is with 4 threads. So we are able to kind of catch up with the fast Diagonal multiplication, at the expense of more allocations and quite some overhead in the case of small number of blocks on the diagonal. Anyway, I think this will be exciting to optimize over concrete use cases. @chriscoey

Another aspect that I realized: when there are 16 or more LinearMaps involved, then the eltype of the resulting BlockDiagonalMap is no longer inferred. I guess the same applies to LinearCombinations etc.

dkarrasch · 2019-11-11T16:19:32Z

Would be great if we could get this one in, as well. The README already promises a new minor release. 😂

dkarrasch · 2019-11-27T09:57:59Z

This is one ready for review.

dkarrasch · 2020-01-03T11:48:08Z

@Jutho If you find the time, can you take a look? On the other hand, there is little "controversial" stuff in here and I have reviewed it myself many times 😂 , so I'd merge soon.

dkarrasch commented Oct 22, 2019

View reviewed changes

src/LinearMaps.jl Outdated Show resolved Hide resolved

dkarrasch commented Oct 22, 2019

View reviewed changes

dkarrasch requested a review from Jutho October 22, 2019 07:05

dkarrasch force-pushed the blockdiag branch from 14de9a8 to 72c3d1b Compare November 11, 2019 14:15

This was referenced Nov 18, 2019

WIP: get rid of A*_mul_B! #74

Closed

Allocation-free 5-arg mul! for custom maps #75

Closed

add mulstyle feature #76

Merged

dkarrasch force-pushed the blockdiag branch from deeac4e to c4ff628 Compare November 27, 2019 09:48

dkarrasch force-pushed the blockdiag branch from c4ff628 to 0c5472e Compare November 29, 2019 10:29

dkarrasch force-pushed the blockdiag branch from 443663d to de0885e Compare December 19, 2019 08:48

dkarrasch mentioned this pull request Dec 30, 2019

select intermediate destination of CompositionMap based on input #80

Merged

dkarrasch force-pushed the blockdiag branch from de0885e to 751759a Compare December 30, 2019 10:15

dkarrasch added 8 commits January 8, 2020 10:45

add block diagonal map

5f31819

some fixes

227fd96

add tests, add 5-arg mul! for BlockMaps, propagate inbounds

284aeef

fix typo

1cf0818

improve coverage

8ca39c1

introduce multiplication with matrices for BlockMaps

44e0b52

fix typo

06f2d87

blockmap code refactor

7028d5d

dkarrasch added 12 commits January 8, 2020 10:45

merge master into blockdiag

68dd8db

test map-matrix-mul with adjoint/transpose blockmaps

f4194ac

move helper stuff

60d4da7

improve coverage

d20cbbc

make at-inbounds work correctly

4625cb5

make * check size, but then multiply inbounds

46a4606

remove redundant code by minimal metaprogramming

7f4a162

use more tuples instead of vectors

f8c2916

use more tuple style in _transblockmul

c08d6e3

use iterators

3fc2c5f

update README and bump minor version

e3c6d95

add examples

70442bc

dkarrasch force-pushed the blockdiag branch from 0d9f684 to 70442bc Compare January 8, 2020 09:45

fix KrylovKit example

0857943

dkarrasch merged commit 039e35f into master Jan 13, 2020

dkarrasch deleted the blockdiag branch January 13, 2020 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BlockMap enhancements #72

BlockMap enhancements #72

dkarrasch commented Oct 16, 2019 •

edited

Loading

coveralls commented Oct 16, 2019 •

edited

Loading

codecov bot commented Oct 16, 2019 •

edited

Loading

dkarrasch Oct 22, 2019

dkarrasch Oct 22, 2019

dkarrasch Oct 22, 2019

dkarrasch Oct 22, 2019

dkarrasch commented Oct 22, 2019

dkarrasch commented Oct 22, 2019 •

edited

Loading

dkarrasch commented Nov 11, 2019

dkarrasch commented Nov 27, 2019

dkarrasch commented Jan 3, 2020

BlockMap enhancements #72

BlockMap enhancements #72

Conversation

dkarrasch commented Oct 16, 2019 • edited Loading

coveralls commented Oct 16, 2019 • edited Loading

codecov bot commented Oct 16, 2019 • edited Loading

Codecov Report

dkarrasch Oct 22, 2019

Choose a reason for hiding this comment

dkarrasch Oct 22, 2019

Choose a reason for hiding this comment

dkarrasch Oct 22, 2019

Choose a reason for hiding this comment

dkarrasch Oct 22, 2019

Choose a reason for hiding this comment

dkarrasch commented Oct 22, 2019

dkarrasch commented Oct 22, 2019 • edited Loading

dkarrasch commented Nov 11, 2019

dkarrasch commented Nov 27, 2019

dkarrasch commented Jan 3, 2020

dkarrasch commented Oct 16, 2019 •

edited

Loading

coveralls commented Oct 16, 2019 •

edited

Loading

codecov bot commented Oct 16, 2019 •

edited

Loading

dkarrasch commented Oct 22, 2019 •

edited

Loading