WIP: Cholesky factorization of dual matrices #11

c42f · 2014-06-06T12:15:37Z

As discussed on the mailing list, here's some code implementing Cholesky factorization of dual matrices in a reasonably efficient manner.

The current implementation doesn't play nicely with the cholfact!() infrastructure, so some additional changes will be required. I can do some digging to figure that out, but suggestions are also very welcome. This is close to my first attempt at production quality julia code, so there could well be some strange things I've done in places.

Allow convenient creation of dual arrays from two parallel arrays RE and DU, and extraction of the epsilon part as an array.

Cholesky factorization using chol() for the real part and solving the matrix equation B = M'*U + U'*M for the epsilon part. This version is fairly fast, as it calls through to the BLAS for the heavy lifting. Requires additional work to play nicely with the Base.cholfact!() infrastructure.

papamarkou · 2014-06-06T15:43:44Z

Great, Chris!

WIP: Cholesky factorization of dual matrices

mlubin · 2014-06-06T16:06:23Z

@c42f, thanks for putting together this PR. @Scidom I think this needed some more work before merging. There are no tests and it doesn't yet follow the standard cholfact interface so it may not even be called by default.

papamarkou · 2014-06-06T16:10:00Z

Ok, I will let you work on the tests and improve it @mlubin, happy if you can commit to it.

c42f · 2014-06-06T22:28:34Z

Yes, it's incomplete, I should have been clearer that I didn't expect it to be merged quite yet (that was what the "WIP" was meant to indicate). Never mind though - I'll just create a new pull request for additional changes. I'm happy to do those, just wanted to open the discussion.

@mlubin The current chol() interface for duals was more of a proof of concept than anything. It correctly gets called for chol(A) where A is a dual matrix, but nothing more.

The cholfact!() and cholfact() docs say

Base.cholfact!(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky

"cholfact!" is the same as "cholfact()", but saves space by
overwriting the input "A", instead of creating a copy.

Base.cholfact(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky

Compute the Cholesky factorization of a dense symmetric positive
(semi)definite matrix "A" and return either a "Cholesky" if
"pivot=false" or "CholeskyPivoted" if "pivot=true". "LU"
may be ":L" for using the lower part or ":U" for the upper
part. The default is to use ":U". The triangular matrix can be
obtained from the factorization "F" with: "F[:L]" and
"F[:U]". The following functions are available for "Cholesky"
objects: "size", "", "inv", "det". For
"CholeskyPivoted" there is also defined a "rank". If
"pivot=false" a "PosDefException" exception is thrown in case
the matrix is not positive definite. The argument "tol"
determines the tolerance for determining the rank. For negative
values, the tolerance is the machine precision.

Seems like quite a bit more work to be done to support all the options. Lower or upper is easy I guess - just transpose after the fact. Pivoting might require a little more work, though I guess it's just a matter of passing the pivot options along to cholfact!(real(A)), and applying the permutation correctly before computing the decomposition of epsilon(A).

I'm not yet sure whether any of this can usefully be done in-place. If we could reinterpret Matrix{Dual{Float64}} into a Matrix{Float64} of twice the number of rows we might have a chance. Do either of you know whether julia gives any guarantees on memory layout which would let us do that safely?

papamarkou · 2014-06-07T17:04:29Z

I don't know the answer to this question @c42f. Thanks for willing to work on improving your Cholesky factorization code, I didn't realize this was not ready (I merged without having noticed the WIP). If you could amend your code, possibly with the help of @mlubin, this would be great as I can't give it my immediate attention due to other ongoing coding work.

mlubin · 2014-06-08T05:52:51Z

@c42f, I've reverted the commit so that you can make a clean PR. What I was thinking about cholfact is that we should re-use the Cholesky parametrized type. So cholfact(Matrix{Dual{Float64}}) returns a Cholesky{Dual{Float64}}. Returning anything else might break code that explicitly stores the factorization object. As you mentioned, the matrix layout is an issue because we want to store two Matrix{Float64} instead of one Matrix{Dual{Float64}} with the factor.
@andreasnoackjensen, any suggestions here for what we should do to provide a custom cholesky factorization for the Dual type?

c42f · 2014-06-08T09:57:35Z

Ok, thanks Miles. I can't reopen this PR since github knows it's merged (unless one of you guys can?) but we can keep discussing it here until I've got some new code to talk about.

It's a good question as to whether we want the factorization to be a pair of Matrix{Float64} vs a single Matrix{Dual{Float64}}. I was assuming we'd want the latter as output (if you look at my test chol() implementation you'll see that's what I produce at the moment). You're probably right that it's better to keep them split apart though: as soon as you want to solve an equation using the factorization, you'll likely want to call the blas again for efficiency.

Looking at the code in base/linalg/factorization.jl, I see the Cholesky stuff is rather tied to LAPACK at the moment, and testing shows it doesn't even work with a BigFloat array yet. That probably means we're a bit ahead of ourselves trying to get it all working for duals.

andreasnoack · 2014-06-08T21:07:25Z

I wonder if it would work if the Dual immutable was loosened such that you could have a Dual{Matrix{Float64}} instead of Matrix{Dual{Float64}}. It would probably require some care in the definitions of the arithmetic operations on Duals, but the BLAS calls would come automatically I guess.

c42f · 2014-06-09T13:47:42Z

Yes, allowing Dual{Matrix{Float64}} seems mathematically sensible, and would allow operations such as multiplication of dual matrices to be a lot faster for any matrices of reasonable size. My main worry would be whether the extra code complexity was worth it.

mlubin · 2014-06-09T22:27:34Z

@andreasnoackjensen, that doesn't quite resolve the issue because Cholesky{T} stores a Matrix{T}, so Cholesky{Dual{Matrix{Float64}}} would store the factor as a Matrix{Dual{Matrix{Float64}}.

The code needed to implement this will also surely double the complexity of the DualNumbers package, which I'm not sure if it's worth it just to be able to call BLAS.

andreasnoack · 2014-06-10T06:14:12Z

@mlubin At first, I thought the same, but instead of Cholesky{Dual{Matrix{Float64}}} is should be a Dual{Cholesky{Matrix{Float64}}}. Then if we define

function \(z::Dual, w::Dual)
    r = real(z)\real(w)
    du = real(z)\(epsilon(w) - epsilon(z)*r)
    return Dual(r, du)
end

it almost works for a Dual{Cholesky} lhs. It requires a * method for Cholesky, but that is okay to have anyway.

However, it is still not obvious that we want this. I am a little worried about the number of temporaries in these operations.

c42f · 2014-06-11T11:22:55Z

Calling BLAS can be a big deal performance-wise, at least vs the implementation of matrix multiplication which Matrix{Dual{Float64}} currently ends up using. Consider this fairly naive expansion in terms of the blas:

function dmul(A,B)
       rA = real(A)
       rB = real(B)
       dual(rA*rB, rA*epsilon(B) + epsilon(A)*rB)
end

A = dual(rand(1000,1000), rand(1000,1000))
B = dual(rand(1000,1000), rand(1000,1000))

Now we have

julia> @time dmul(A,B)
elapsed time: 0.112314667 seconds (80001120 bytes allocated)

julia> @time A*B
elapsed time: 1.349246384 seconds (16000400 bytes allocated)

It's a lot of extra allocation, but I'd take that for a 10x performance boost!

The question of whether DualNumbers should take on the complexity of making Dual{Cholesky{Matrix{Float64}}} work is tricky. Mainly I don't know whether it's something people actually need, and whether it will generalize usefully into a common pattern for other Cholesky factorizations. Would it be better just to define a new factorization type internal to DualNumbers and return that instead?

andreasnoack · 2014-06-11T11:59:27Z

I think the best solution would be to relax Dual to allow for Dual{Matrix} and Dual{Factorization}. It wouldn't require that many changes to the code and it wouldn't change anything for Dual{Float64}. The / and \ methods are easily fixed along the line of my example above, some methods restricted to Number or Real could just have the restriction removed and some methods for Dual should be restricted to Dual{T<:Number}.

mlubin · 2014-06-11T15:20:54Z

It's too bad that BLAS/LAPACK don't accept strides for matrices, since then we could just reinterpret a Matrix{Dual} into two different matrices. The 10x performance boost is definitely compelling. My only doubt is how this works with Dual{Float64} as a drop-in replacement for Float64. This will break any code that tries to store a Cholesky{T}, and there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.

mlubin · 2014-06-11T15:27:23Z

To put it more clearly, the primary use of DualNumbers is by users who don't need to know what Dual is and shouldn't need to make any modifications to their code besides making it type generic.

andreasnoack · 2014-06-11T15:33:31Z

This will break any code that tries to store a Cholesky{T},

That is right, but it is somewhat similar to the requirement that you cannot restrict your function arguments to Float64 when you want to use DualNumbers.

there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.

I don't get this part. Can you elaborate?

mlubin · 2014-06-11T16:45:13Z

@andreasnoackjensen There's no loss of exact typing when you write your function to take a number type T. Currently it's correct to assume that chol(::Matrix{T}) returns a Cholesky{T}, and it's reasonable to want to explicitly use this type in the code for storage or type assertions. However, if chol(::Matrix{Dual{T}}) returns a Dual{Cholesky{T}}, then there's no way to write down the return type that's correct for both T = Float64 and T = Dual{Float64}, unless I'm mistaken.

andreasnoack · 2014-06-11T18:24:13Z

@mlubin I think that some of your message is missing.

mlubin · 2014-06-11T18:59:05Z

Should be edited now.

andreasnoack · 2014-06-11T19:19:27Z

The idea was to have cholfact(Dual{Matrix{T}}) (not cholfact(Matrix{Dual {T}})) return a Dual{Cholesky{T}}. Then you would have the option to choose between cholfact(Dual{Matrix{T}}) and cholfact(Matrix{Dual{T}}) where the former can take advantage of BLAS, but sometimes your functions don't allow you to use it.

I don't know if it is feasible to differentiate through a Dual{Matrix{T}}, but I think that all the matrix algebra can work for the type with the modifications mentioned earlier.

c42f · 2014-06-21T06:40:17Z

Interesting conversation guys, though it leaves me unsure about which way is best to proceed. I'm not entirely sure I want to take on the work of rearranging the whole package around allowing Dual{Matrix{T}} just to get Cholesky factorization in. (I don't personally have a use case for this remember, it was just an interesting problem on the mailing list. This makes me slightly worried I'll make the wrong tradeoffs!)

andreasnoack · 2014-06-21T07:38:06Z

I agree. I also like to think about problems like this. I tried to compare
the two approaches and in my tests the triangular Lyapunov solver consumed
the gain from BLAS so I am not sure that this approach is worth it.
Den 21/06/2014 08.40 skrev "Chris Foster" [email protected]:

Interesting conversation guys, though it leaves me unsure about which way
is best to proceed. I'm not entirely sure I want to take on the work of
rearranging the whole package around allowing Dual{Matrix{T}} just to get
Cholesky factorization in. (I don't personally have a use case for this
remember, it was just an interesting problem on the mailing list. This
makes me slightly worried I'll make the wrong tradeoffs!)

—
Reply to this email directly or view it on GitHub
#11 (comment)
.

c42f · 2014-06-21T13:26:57Z

@andreasnoackjensen - by "triangular Lyapunov solver" do you mean that the function tri_ss_solve() consumes the gain from using the BLAS on the real part? When I test the optimized version of chol() from the original PR vs the choldn!() posted on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.

andreasnoack · 2014-06-21T13:40:33Z

I am at the sea intensively marking exams without a decent internet
connection so I cannot check, but could it be that the generic cholesky in
my pull request is faster than the one you have used?
Den 21/06/2014 15.26 skrev "Chris Foster" [email protected]:

@andreasnoackjensen https://github.com/andreasnoackjensen - by
"triangular Lyapunov solver" do you mean that the function tri_ss_solve()
consumes the gain from using the BLAS on the real part? When I test the
optimized version of chol() from the original PR vs the choldn!() posted
on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.

—
Reply to this email directly or view it on GitHub
#11 (comment)
.

c42f · 2014-06-22T03:35:38Z

Sure, that could certainly be the case - I took choldn!() from the mailing list without looking much at it. A generic solution for Cholesky factorization will certainly be great to have. I think I found the pull request you refer to... JuliaLang/julia#7236

mlubin · 2014-06-23T01:07:42Z

I just did some timings on 1000x1000 matrices. cholfact from the generic cholesky PR is only about 20% faster than lufact, which is a bit surprising. @c42f's specialized cholesky factorization is almost 4x faster than the generic cholesky, and 2x for solves.

Unless the generic cholesky can be easily improved, it still seems valuable to have this specialized code. I'd vote for squeezing the specialized factorization result into a Cholesky{Dual{T}} instead of trying to do Dual{Matrix{T}} and Dual{Cholesky{T}}.

c42f · 2014-06-23T11:41:12Z

Ok cool, thanks for the extra testing @mlubin. Along the same lines, it's probably worth having a specialized version of matrix multiplication for BLAS types, since that's so much faster. It's really too bad that gemm doesn't support striding since the BLAS versions imply a lot of extra memory allocation.

c42f added 2 commits June 6, 2014 21:21

Vectorized versions of dual(re,du) and epsilon(d)

5988172

Allow convenient creation of dual arrays from two parallel arrays RE and DU, and extraction of the epsilon part as an array.

papamarkou added a commit that referenced this pull request Jun 6, 2014

Merge pull request #11 from c42f/master

115fd38

WIP: Cholesky factorization of dual matrices

papamarkou merged commit 115fd38 into JuliaDiff:master Jun 6, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Cholesky factorization of dual matrices #11

WIP: Cholesky factorization of dual matrices #11

c42f commented Jun 6, 2014

papamarkou commented Jun 6, 2014

mlubin commented Jun 6, 2014

papamarkou commented Jun 6, 2014

c42f commented Jun 6, 2014

papamarkou commented Jun 7, 2014

mlubin commented Jun 8, 2014

c42f commented Jun 8, 2014

andreasnoack commented Jun 8, 2014

c42f commented Jun 9, 2014

mlubin commented Jun 9, 2014

andreasnoack commented Jun 10, 2014

c42f commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

mlubin commented Jun 11, 2014

mlubin commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

mlubin commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

mlubin commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

c42f commented Jun 21, 2014

andreasnoack commented Jun 21, 2014

c42f commented Jun 21, 2014

andreasnoack commented Jun 21, 2014

c42f commented Jun 22, 2014

mlubin commented Jun 23, 2014

c42f commented Jun 23, 2014

WIP: Cholesky factorization of dual matrices #11

WIP: Cholesky factorization of dual matrices #11

Conversation

c42f commented Jun 6, 2014

papamarkou commented Jun 6, 2014

mlubin commented Jun 6, 2014

papamarkou commented Jun 6, 2014

c42f commented Jun 6, 2014

papamarkou commented Jun 7, 2014

mlubin commented Jun 8, 2014

c42f commented Jun 8, 2014

andreasnoack commented Jun 8, 2014

c42f commented Jun 9, 2014

mlubin commented Jun 9, 2014

andreasnoack commented Jun 10, 2014

c42f commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

mlubin commented Jun 11, 2014

mlubin commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

mlubin commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

mlubin commented Jun 11, 2014

andreasnoack commented Jun 11, 2014

c42f commented Jun 21, 2014

andreasnoack commented Jun 21, 2014

c42f commented Jun 21, 2014

andreasnoack commented Jun 21, 2014

c42f commented Jun 22, 2014

mlubin commented Jun 23, 2014

c42f commented Jun 23, 2014