-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LU and eigen routines slower than MKL #2795
Comments
What skylake? What size of inputs? Which OpenBLAS version? Any virtualisation? |
Skylake (Haswell refresh without AVX512) or SkylakeX (with AVX512) ? DGEMM performance for the latter should be about on par with MKL if you use a very recent 0.3.x release or the current |
Thanks. It is SkylakeX with AVX-512. Tried input sizes from |
0.3.7 (from a year ago) had all parts of the initial AVX512 DGEMM implementation disabled as it turned out to be incorrect. AVX512 DGEMM reappeared in 0.3.8 and was further improved in 0.3.10, so ideally you should be trying that (or git |
Good to know, thanks! I'll try 0.3.10 and report back. |
Well, it turns out that I was using 0.3.10. But I have some more observations as shown in the below plots.
I can run a profiler on |
Interesting, thanks - GETRF is one of the few LAPACK functions that are reimplemented (lapack/getrf/getrf_parallel.c, already in the original GotoBLAS) rather than copied from the reference implementation. There were some fixes to my previous heavy-handed approach to making it thread-safe in february, perhaps there is more wrong with it. Also the DTRSM it calls is not optimized for SkylakeX (and neither is LASWP, another reimplemented function). |
Yes I noticed that OpenBLAS |
Hi, I'm running some performance comparisons between OpenBLAS and MKL for LU and eigen routines. I see that OpenBLAS tests with, for example,
dgetrf
anddsyevd
, are about 3 times slower than MKL. These are multi-threaded tests ran on a Skylake machine.I wonder if you have any benchmark results vs. MKL and if so how do they look like?
Thanks.
The text was updated successfully, but these errors were encountered: