-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eig is slow compared to mathematica and octave #72
Comments
On my Mac Julia is the faster of the two. A guess could be the he uses reference BLAS and LAPACK for Julia but haven't found a way to make Mathematica do the same. The first Regarding the keywords, it is my mistake. I am preparing pull request. I also forgot |
Thanks for explaining re the |
So I asked MathematicaSE and apparently Windows Mathematica uses the Intel MKL BLAS. I am using Julia 0.2.0 with the OpenBLAS architecture, which seems 3 to 6 times slower than Mathematica on my machine. Is this reasonable? If I change Julia to use the Intel MKL, will it speed up? Or are more complicated things at play? |
The performance of OpenBLAS depends on the architecture, but it seems that something is not right on Windows. I have just tried your example on a Windows server and the result depends on the number of threads used. Unfortunately, in the wrong direction. I get
On the same server Mathematica spends 5-6 seconds. If you use MKL when compiling julia, I think that you should expect timings comparable to Mathematica. My info is
cc.@xianyi |
@andreasnoack: That's a strange issue, and I can reproduce it on my machine:
Output:
Is this some kind of bug? Should it be faster when there are more threads dedicated? Also, the above timings are really strange! Before I changed the number of threads, it was routinely taking 7.5 seconds to compute! Let me play around a little more, and see what I can find... |
Okay, I restarted the kernel and once again ran
and it took 7.4 seconds on average. After setting So I don't know much about Julia, but is there a way to determine the default number of BLAS threads used by the kernel? |
Would someone with Windows access please try and see if this is still an issue? |
worse, actually - now it segfaults:
|
The backtrace might be a bit wrong there, in gdb it looks like this:
|
Do you only get the segfault when running single threaded? |
No, segfaults with |
It's a bit surprising that no users have complained about this. |
And that the tests didn't show this as soon as we upgraded to 0.2.15. I also tried with an 0.4 rc that was using 0.2.14 and the performance looked similar to what was reported above. For @xianyi's benefit, can we come up with standalone test cases in C or Fortran that would call
|
Yes, we should do that, but it's a bit of a pain because it's an "expert driver routine" and therefore takes 23 arguments. |
We should totally write an automatic test-case-generator that converts a given ccall invocation into standalone C. https://xkcd.com/974 |
Reported the segfault upstream at OpenMathLib/OpenBLAS#697 - the simplified |
On my mac, this takes 15 seconds on |
Removing the windows label since this is the case on mac as well. @andreasnoack Can you verify once more? |
I have a theory. This might not be an openblas-vs-mkl problem, it might be a gfortran-vs-ifort problem. Can anyone on Linux or OSX who has access to Intel compilers try building openblas using icc and ifort, and compare that to MKL/Accelerate? |
Cc @ranjanan |
I have been thinking the same too. Its hard to believe that openblas' dgemm is good for a few LAPACK routines and not good for others. |
For the same
All those allocations are perhaps due to type instability, which I believe there are other issues on, but I don't think those are the reasons for the poorer performance with openblas. |
I think this is OpenBLAS. The Schur factorization is not GEMM heavy. It has a lot of GEMV, GER and some TRMM (as well as a lot of special bulge chasing operations that are not BLAS heavy). Furthermore, many of the operations are on smaller pieces of the matrix and here the difference between VecLib and OpenBLAS can be significant. See e.g.: julia> size(A)
(50,50)
# OpenBLAS
julia> @time for i = 1:10000; BLAS.gemv!('N', 1.0, A, v, 0.0, similar(v)); end
0.190648 seconds (10.00 k allocations: 5.188 MB)
# VecLib
julia> @time for i = 1:10000; TestGees.gemv!('N', 1.0, A, v, 0.0, similar(v)); end
0.006144 seconds (10.00 k allocations: 5.188 MB)
julia> 0.190648/0.006144
31.029947916666668
julia> norm(TestGees.gemv!('N', 1.0, A, v, 0.0, similar(v)) - BLAS.gemv!('N', 1.0, A, v, 0.0, similar(v)))
7.25785747276179e-15 |
Your OpenBLAS seems awfully slow. I get: julia> @time for i = 1:10000; BLAS.gemv!('N', 1.0, A, v, 0.0, similar(v)); end
0.005647 seconds (10.00 k allocations: 5.188 MB) |
@KristofferC What is your
|
|
There is something weird happening on OS X (and maybe Windows, I don't have a machine handy). On a Linux Haswell I get similar timings as you, but we have reproduced the slow timings on two different Macs in the office. I don't have MKL on the Linux Haswell, but I have on a Linux Westmere machine with MKL and there I don't see a difference between OpenBLAS and MKL for |
The |
@xianyi Yes. It creates a new vector to avoid overwriting |
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux). I didn't get the performance benefit from .align. Thus, I deleted it.
I think I fixed this performance bug on OpenBLAS develop branch. The |
@xianyi Great. I see a big improvement for OpenBLAS, but there is still 3.5x difference between OpenBLAS and vecLib. Do you think it is possible to reduce that difference? julia> @time for i = 1:1000000; BLAS.gemv!('N', 1.0, A, v, 0.0, similar(v)); end
1.989242 seconds (1000.00 k allocations: 518.799 MB, 0.89% gc time)
julia> @time for i = 1:1000000; Tmp1.gemv!('N', 1.0, A, v, 0.0, similar(v)); end
0.585461 seconds (1000.00 k allocations: 518.799 MB, 2.82% gc time) |
@andreasnoack Do we still have this issue with OpenBLAS 0.3.0? |
This is now faster for me than octave from brew. Julia is about 2.5 seconds and Octave is 3 seconds. |
On StackOverflow a user pointed out that
eig
seems slow compared to Mathematica: http://stackoverflow.com/questions/21641621/eigendecompositions-are-5-times-slower-in-julia-than-in-mathematicaI compared against Matlab (I don't have Mathematica), and it's about a factor of 2 slower. I profiled it, and it indicates that all the time is in
linalg/lapack.jl; geevx!; line: 1209
. I briefly read the docs fordgeevx
, and I confess I don't understand why thatccall
is being made twice.Moreover,
eig
doesn't accept keywords, but?eig
suggests it should take abalance
keyword.The text was updated successfully, but these errors were encountered: