Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Cholesky factorization of dual matrices #11

Merged
merged 2 commits into from
Jun 6, 2014

Conversation

c42f
Copy link
Contributor

@c42f c42f commented Jun 6, 2014

As discussed on the mailing list, here's some code implementing Cholesky factorization of dual matrices in a reasonably efficient manner.

The current implementation doesn't play nicely with the cholfact!() infrastructure, so some additional changes will be required. I can do some digging to figure that out, but suggestions are also very welcome. This is close to my first attempt at production quality julia code, so there could well be some strange things I've done in places.

c42f added 2 commits June 6, 2014 21:21
Allow convenient creation of dual arrays from two parallel arrays RE and
DU, and extraction of the epsilon part as an array.
Cholesky factorization using chol() for the real part and solving the
matrix equation B = M'*U + U'*M for the epsilon part.  This version is
fairly fast, as it calls through to the BLAS for the heavy lifting.

Requires additional work to play nicely with the Base.cholfact!()
infrastructure.
@papamarkou
Copy link
Contributor

Great, Chris!

papamarkou added a commit that referenced this pull request Jun 6, 2014
WIP: Cholesky factorization of dual matrices
@papamarkou papamarkou merged commit 115fd38 into JuliaDiff:master Jun 6, 2014
@mlubin
Copy link
Contributor

mlubin commented Jun 6, 2014

@c42f, thanks for putting together this PR. @Scidom I think this needed some more work before merging. There are no tests and it doesn't yet follow the standard cholfact interface so it may not even be called by default.

@papamarkou
Copy link
Contributor

Ok, I will let you work on the tests and improve it @mlubin, happy if you can commit to it.

@c42f
Copy link
Contributor Author

c42f commented Jun 6, 2014

Yes, it's incomplete, I should have been clearer that I didn't expect it to be merged quite yet (that was what the "WIP" was meant to indicate). Never mind though - I'll just create a new pull request for additional changes. I'm happy to do those, just wanted to open the discussion.

@mlubin The current chol() interface for duals was more of a proof of concept than anything. It correctly gets called for chol(A) where A is a dual matrix, but nothing more.

The cholfact!() and cholfact() docs say

Base.cholfact!(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky

"cholfact!" is the same as "cholfact()", but saves space by
overwriting the input "A", instead of creating a copy.

Base.cholfact(A, [LU,][pivot=false,][tol=-1.0]) -> Cholesky

Compute the Cholesky factorization of a dense symmetric positive
(semi)definite matrix "A" and return either a "Cholesky" if
"pivot=false" or "CholeskyPivoted" if "pivot=true". "LU"
may be ":L" for using the lower part or ":U" for the upper
part. The default is to use ":U". The triangular matrix can be
obtained from the factorization "F" with: "F[:L]" and
"F[:U]". The following functions are available for "Cholesky"
objects: "size", "", "inv", "det". For
"CholeskyPivoted" there is also defined a "rank". If
"pivot=false" a "PosDefException" exception is thrown in case
the matrix is not positive definite. The argument "tol"
determines the tolerance for determining the rank. For negative
values, the tolerance is the machine precision.

Seems like quite a bit more work to be done to support all the options. Lower or upper is easy I guess - just transpose after the fact. Pivoting might require a little more work, though I guess it's just a matter of passing the pivot options along to cholfact!(real(A)), and applying the permutation correctly before computing the decomposition of epsilon(A).

I'm not yet sure whether any of this can usefully be done in-place. If we could reinterpret Matrix{Dual{Float64}} into a Matrix{Float64} of twice the number of rows we might have a chance. Do either of you know whether julia gives any guarantees on memory layout which would let us do that safely?

@papamarkou
Copy link
Contributor

I don't know the answer to this question @c42f. Thanks for willing to work on improving your Cholesky factorization code, I didn't realize this was not ready (I merged without having noticed the WIP). If you could amend your code, possibly with the help of @mlubin, this would be great as I can't give it my immediate attention due to other ongoing coding work.

@mlubin
Copy link
Contributor

mlubin commented Jun 8, 2014

@c42f, I've reverted the commit so that you can make a clean PR. What I was thinking about cholfact is that we should re-use the Cholesky parametrized type. So cholfact(Matrix{Dual{Float64}}) returns a Cholesky{Dual{Float64}}. Returning anything else might break code that explicitly stores the factorization object. As you mentioned, the matrix layout is an issue because we want to store two Matrix{Float64} instead of one Matrix{Dual{Float64}} with the factor.
@andreasnoackjensen, any suggestions here for what we should do to provide a custom cholesky factorization for the Dual type?

@c42f
Copy link
Contributor Author

c42f commented Jun 8, 2014

Ok, thanks Miles. I can't reopen this PR since github knows it's merged (unless one of you guys can?) but we can keep discussing it here until I've got some new code to talk about.

It's a good question as to whether we want the factorization to be a pair of Matrix{Float64} vs a single Matrix{Dual{Float64}}. I was assuming we'd want the latter as output (if you look at my test chol() implementation you'll see that's what I produce at the moment). You're probably right that it's better to keep them split apart though: as soon as you want to solve an equation using the factorization, you'll likely want to call the blas again for efficiency.

Looking at the code in base/linalg/factorization.jl, I see the Cholesky stuff is rather tied to LAPACK at the moment, and testing shows it doesn't even work with a BigFloat array yet. That probably means we're a bit ahead of ourselves trying to get it all working for duals.

@andreasnoack
Copy link
Member

I wonder if it would work if the Dual immutable was loosened such that you could have a Dual{Matrix{Float64}} instead of Matrix{Dual{Float64}}. It would probably require some care in the definitions of the arithmetic operations on Duals, but the BLAS calls would come automatically I guess.

@c42f
Copy link
Contributor Author

c42f commented Jun 9, 2014

Yes, allowing Dual{Matrix{Float64}} seems mathematically sensible, and would allow operations such as multiplication of dual matrices to be a lot faster for any matrices of reasonable size. My main worry would be whether the extra code complexity was worth it.

@mlubin
Copy link
Contributor

mlubin commented Jun 9, 2014

@andreasnoackjensen, that doesn't quite resolve the issue because Cholesky{T} stores a Matrix{T}, so Cholesky{Dual{Matrix{Float64}}} would store the factor as a Matrix{Dual{Matrix{Float64}}.

The code needed to implement this will also surely double the complexity of the DualNumbers package, which I'm not sure if it's worth it just to be able to call BLAS.

@andreasnoack
Copy link
Member

@mlubin At first, I thought the same, but instead of Cholesky{Dual{Matrix{Float64}}} is should be a Dual{Cholesky{Matrix{Float64}}}. Then if we define

function \(z::Dual, w::Dual)
    r = real(z)\real(w)
    du = real(z)\(epsilon(w) - epsilon(z)*r)
    return Dual(r, du)
end

it almost works for a Dual{Cholesky} lhs. It requires a * method for Cholesky, but that is okay to have anyway.

However, it is still not obvious that we want this. I am a little worried about the number of temporaries in these operations.

@c42f
Copy link
Contributor Author

c42f commented Jun 11, 2014

Calling BLAS can be a big deal performance-wise, at least vs the implementation of matrix multiplication which Matrix{Dual{Float64}} currently ends up using. Consider this fairly naive expansion in terms of the blas:

function dmul(A,B)
       rA = real(A)
       rB = real(B)
       dual(rA*rB, rA*epsilon(B) + epsilon(A)*rB)
end

A = dual(rand(1000,1000), rand(1000,1000))
B = dual(rand(1000,1000), rand(1000,1000))

Now we have

julia> @time dmul(A,B)
elapsed time: 0.112314667 seconds (80001120 bytes allocated)

julia> @time A*B
elapsed time: 1.349246384 seconds (16000400 bytes allocated)

It's a lot of extra allocation, but I'd take that for a 10x performance boost!

The question of whether DualNumbers should take on the complexity of making Dual{Cholesky{Matrix{Float64}}} work is tricky. Mainly I don't know whether it's something people actually need, and whether it will generalize usefully into a common pattern for other Cholesky factorizations. Would it be better just to define a new factorization type internal to DualNumbers and return that instead?

@andreasnoack
Copy link
Member

I think the best solution would be to relax Dual to allow for Dual{Matrix} and Dual{Factorization}. It wouldn't require that many changes to the code and it wouldn't change anything for Dual{Float64}. The / and \ methods are easily fixed along the line of my example above, some methods restricted to Number or Real could just have the restriction removed and some methods for Dual should be restricted to Dual{T<:Number}.

@mlubin
Copy link
Contributor

mlubin commented Jun 11, 2014

It's too bad that BLAS/LAPACK don't accept strides for matrices, since then we could just reinterpret a Matrix{Dual} into two different matrices. The 10x performance boost is definitely compelling. My only doubt is how this works with Dual{Float64} as a drop-in replacement for Float64. This will break any code that tries to store a Cholesky{T}, and there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.

@mlubin
Copy link
Contributor

mlubin commented Jun 11, 2014

To put it more clearly, the primary use of DualNumbers is by users who don't need to know what Dual is and shouldn't need to make any modifications to their code besides making it type generic.

@andreasnoack
Copy link
Member

This will break any code that tries to store a Cholesky{T},

That is right, but it is somewhat similar to the requirement that you cannot restrict your function arguments to Float64 when you want to use DualNumbers.

there's no way to write down the factorization type correctly for both Float64 and Dual{Float64}.

I don't get this part. Can you elaborate?

@mlubin
Copy link
Contributor

mlubin commented Jun 11, 2014

@andreasnoackjensen There's no loss of exact typing when you write your function to take a number type T. Currently it's correct to assume that chol(::Matrix{T}) returns a Cholesky{T}, and it's reasonable to want to explicitly use this type in the code for storage or type assertions. However, if chol(::Matrix{Dual{T}}) returns a Dual{Cholesky{T}}, then there's no way to write down the return type that's correct for both T = Float64 and T = Dual{Float64}, unless I'm mistaken.

@andreasnoack
Copy link
Member

@mlubin I think that some of your message is missing.

@mlubin
Copy link
Contributor

mlubin commented Jun 11, 2014

Should be edited now.

@andreasnoack
Copy link
Member

The idea was to have cholfact(Dual{Matrix{T}}) (not cholfact(Matrix{Dual {T}})) return a Dual{Cholesky{T}}. Then you would have the option to choose between cholfact(Dual{Matrix{T}}) and cholfact(Matrix{Dual{T}}) where the former can take advantage of BLAS, but sometimes your functions don't allow you to use it.

I don't know if it is feasible to differentiate through a Dual{Matrix{T}}, but I think that all the matrix algebra can work for the type with the modifications mentioned earlier.

@c42f
Copy link
Contributor Author

c42f commented Jun 21, 2014

Interesting conversation guys, though it leaves me unsure about which way is best to proceed. I'm not entirely sure I want to take on the work of rearranging the whole package around allowing Dual{Matrix{T}} just to get Cholesky factorization in. (I don't personally have a use case for this remember, it was just an interesting problem on the mailing list. This makes me slightly worried I'll make the wrong tradeoffs!)

@andreasnoack
Copy link
Member

I agree. I also like to think about problems like this. I tried to compare
the two approaches and in my tests the triangular Lyapunov solver consumed
the gain from BLAS so I am not sure that this approach is worth it.
Den 21/06/2014 08.40 skrev "Chris Foster" [email protected]:

Interesting conversation guys, though it leaves me unsure about which way
is best to proceed. I'm not entirely sure I want to take on the work of
rearranging the whole package around allowing Dual{Matrix{T}} just to get
Cholesky factorization in. (I don't personally have a use case for this
remember, it was just an interesting problem on the mailing list. This
makes me slightly worried I'll make the wrong tradeoffs!)


Reply to this email directly or view it on GitHub
#11 (comment)
.

@c42f
Copy link
Contributor Author

c42f commented Jun 21, 2014

@andreasnoackjensen - by "triangular Lyapunov solver" do you mean that the function tri_ss_solve() consumes the gain from using the BLAS on the real part? When I test the optimized version of chol() from the original PR vs the choldn!() posted on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.

@andreasnoack
Copy link
Member

I am at the sea intensively marking exams without a decent internet
connection so I cannot check, but could it be that the generic cholesky in
my pull request is faster than the one you have used?
Den 21/06/2014 15.26 skrev "Chris Foster" [email protected]:

@andreasnoackjensen https://github.com/andreasnoackjensen - by
"triangular Lyapunov solver" do you mean that the function tri_ss_solve()
consumes the gain from using the BLAS on the real part? When I test the
optimized version of chol() from the original PR vs the choldn!() posted
on the mailing list, I get a factor of 6x speedup for 1000x1000 matrices.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@c42f
Copy link
Contributor Author

c42f commented Jun 22, 2014

Sure, that could certainly be the case - I took choldn!() from the mailing list without looking much at it. A generic solution for Cholesky factorization will certainly be great to have. I think I found the pull request you refer to... JuliaLang/julia#7236

@mlubin
Copy link
Contributor

mlubin commented Jun 23, 2014

I just did some timings on 1000x1000 matrices. cholfact from the generic cholesky PR is only about 20% faster than lufact, which is a bit surprising. @c42f's specialized cholesky factorization is almost 4x faster than the generic cholesky, and 2x for solves.

Unless the generic cholesky can be easily improved, it still seems valuable to have this specialized code. I'd vote for squeezing the specialized factorization result into a Cholesky{Dual{T}} instead of trying to do Dual{Matrix{T}} and Dual{Cholesky{T}}.

@c42f
Copy link
Contributor Author

c42f commented Jun 23, 2014

Ok cool, thanks for the extra testing @mlubin. Along the same lines, it's probably worth having a specialized version of matrix multiplication for BLAS types, since that's so much faster. It's really too bad that gemm doesn't support striding since the BLAS versions imply a lot of extra memory allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants