-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in DPOTRF() in SKYLAKEX AVX512 build but works fine in HASWELL build. #1643
Comments
AVX512 support is just SGEMM/DGEMM right now. (Wonder if the corresponding spotrf testcase would succeed, as DGEMM was an add-on to the original PR - unfortunately I do not have the hardware) |
I'll try to take a look. it's... curious how this can happen; if dgemm was completely broken a lot more things would be failing |
The testspotrf test case works correctly with this version of the library, so the problem is specific to double precision. |
I've rebuilt the libraries and tried to make everything as identical as possible except for the TARGET. The gcc-6 compiler used here is: gcc version 6.4.0 20180424 (Ubuntu 6.4.0-17ubuntu1~16.04)
The build completed with reported test failures. Both testspotrf and testdpotrf produce correct results.
All files were compiled with march=skylake-avx512. I did not have to modify COMMON_OPT Several tests failed:
With this version of the library, testspotrf produced correct results, but testdpotrf produced incorrect results as described before. ./testdpotrf 10000 Comments:
I've attached the test programs and output from the SKYAKEX build Please let me know if there's anything further that I can do to help isolate this bug. |
Setting OMP_NUM_THREADS leads to a slightly different failure:
|
I will spend my evening on tihs later today
if I can't find it we may need to disable dgemm/avx512 for now
…On Tue, Jun 26, 2018 at 2:35 PM Brian Borchers ***@***.***> wrote:
Setting OMP_NUM_THREADS leads to a slightly different failure:
./testdpotrf 10000
dpotrf info=19
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPeFWUIo7q38PJSHVL3vo7R_DfD0oc8ks5uAqk4gaJpZM4U3NxM>
.
|
I noticed that DPOTRF calls DTRSM, which is one of the routines that failed its tests during the build. Given that DTRSM is failing its test at build time, this should probably be the priority in debugging the problem. I'd just assume that DPOTRF is failing because DTRSM is broken. Haven't there been other reported problems with DTRSM recently? This could be a bug with gcc-6 in combination with -march-skylake-avx512. Perhaps applying the march=skylake-avx512 flag only to routines needed by dgemm/sgemm would resolve the problem in dtrsm and point to a compiler bug. |
trsm.c calls gemm_thread_m and gemm_thread_n, so there could well be an issue with dgemm that is breaking dtrsm and then dpotrf. |
avx512 changed the shape of the kernel. there is a pre-existing haswell
version of that shape I used as base
my first test will be to swap to the hsw version
if that fails it's the shape not the actual avx code
…On Tue, Jun 26, 2018, 16:49 Brian Borchers ***@***.***> wrote:
trsm.c calls gemm_thread_m and gemm_thread_n, so there could well be an
issue with dgemm that is breaking dtrsm and then dpotrf.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPeFYywxUlZ22RzFlBuB_UDULF_C0H8ks5uAsh4gaJpZM4U3NxM>
.
|
OK- let me know if you'd like me to run a test with this. |
Not all of the pre-existing kernel files may be usable without prior inspection unfortunately, if nothing uses them currently they may have been retired due to unexplained errors but never removed. Others including the generic trmmkernel_16x2 have no apparent usage history at all. I have now tried a Haswell build with the kernels used on Skylake (obviously using the pre-existing 16x2 dgemm kernel rather than the new avx512) and see DSYMM,DTRMM and DTRSM failing as well. |
It appears to be the dgemmkernel_16x2 itself that causes the failures. Looking through the git history I am not convinced it was ever used, although there was some work done on it in 2013. (Checking now if it is just the GEMM_DEFAULT_UNROLL_ parameters in param.h that must be adjusted to go with the change in kernel - no, this does not help) |
if the 16x2 will never work I might be able to get the piledriver 8x2 to
work for dgemm... but it's narrow and that will hurt performance a bit.
…On Wed, Jun 27, 2018 at 1:45 AM Martin Kroeker ***@***.***> wrote:
It appears to be the dgemmkernel_16x2 itself that causes the failures.
Looking through the git history I am not convinced it was ever used,
although there was some work done on it in 2013. (Checking now if it is
just the GEMM_DEFAULT_UNROLL_ parameters in param.h that must be adjusted
to go with the change in kernel)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPeFZHNUPWC9MC1tgLHVEZbllJ0JQ-Dks5uA0YdgaJpZM4U3NxM>
.
|
Tried an earler version of dgemm_16x2_haswell (the one labeled "corrected and tested" that preceded the "optimized" commit), but that even fails on DGEMM itself (and DSYRK, while interestingly DTRMM passes) |
curious but...
now what?
I suspect the 16x2 just needs to be retired since clearly it never worked
at all.
I'm pondering writing a whole new dgemm kernel since I kind of want a 32x4
(or 16x8 but 32x4 is nicer) just it is a much steeper ramp than just
replacing the kernel of a supposedly-working wrapper framework.
…On Wed, Jun 27, 2018 at 7:38 AM Martin Kroeker ***@***.***> wrote:
Tried an earler version of dgemm_16x2_haswell (the one labeled "corrected
and tested" that preceded the "optimized" commit), but that even fails on
DGEMM itself (and DSYRK, while interestingly DTRMM passes)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPeFdsjWVVfHVt_AnNClIJOfc68un8Wks5uA5jKgaJpZM4U3NxM>
.
|
I have disabled that DGEMM kernel for now (after fumbling aimlessly both with the 16x2 haswell kernel and my attempt of adapting its 8x2 piledriver counterpart) and moved the milestones on the two related tickets to 0.3.2. This will allow 0.3.1 to still get the initial SkylakeX support with the AVX512 SGEMM kernel |
The latest development version still fails on one of the tests at build time. With make TARGET=SKYLAKEX CC=gcc-6 FC=gfortran-6 USE_OPENMP=1 I get: DTRMM PASSED THE TESTS OF ERROR-EXITS ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE ******* This is exactly the same failure that we were seeing before the DGEMM kernel was removed. However, testdpotrf does work as expected (slow, but correct.) |
Sorry - I had not disabled the 16x2 TRMMKERNEL as well with the latest PR, thinking it was harmless. Thanks for catching this, I will create a new PR in a minute. |
Now the SKYLAKEX build works without errors. The SKYLAKEX version is about 10% faster on SGEMM (360 gigflops vs. 330) than the HASWELL build. So, no significant performance improvement yet, but everything does function. |
sgemm also still needs more performance tuning; avx512 really wants bigger
kernels to hit the 32 FMAs per cycle that the cpu can do.
…On Sat, Jun 30, 2018 at 9:34 AM Brian Borchers ***@***.***> wrote:
Now the SKYLAKEX build works without errors. The SKYLAKEX version is about
10% faster on SGEMM (360 gigflops vs. 330) than the HASWELL build. So, no
significant performance improvement yet, but everything does function.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPeFemAmqVO6f19hoyDi-s22k0JO9dNks5uB6iVgaJpZM4U3NxM>
.
|
Still I guess it would be labeled significant if we ever saw a ten percent performance loss somewhere... |
absolutely. some companies (ahem) spend a LOT of effort making a cpu a few
percent faster in architecture.
performance comes by foot, but leaves by horse.
anyway current code tops out at 18 multiplies per cycle.... I have kernels
in my test bench that get over 29....
but they're bigger and I need to then change a lot more "glue" in openblas
to make them fit, and I know I need to learn a lot more of said "glue" first
…On Sat, Jun 30, 2018 at 9:59 AM Martin Kroeker ***@***.***> wrote:
Still I guess it would be labeled significant if we ever saw a ten percent
performance *loss* somewhere...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1643 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPeFYlhUPzImhfomqeLIiCdj9BnCB7Bks5uB65agaJpZM4U3NxM>
.
|
FYI, some SKL-X CPUs come with 1 AVX-512 execution unit per core (under 10 cores, so i7 7820X and under), but the 10 core (i9 7900X) and higher core count CPUs have 2 AVX-512 units per core |
Somehow I suspect fenrus75 knows this 😄 |
Assuming fixed by the new DGEMM kernel now in 0.3.4 |
I've built the development version (downloaded June 25) for TARGET=HASWELL and TARGET=SKYLAKEX. Unfortunately, the attached test program fails using the version of the library built with TARGET=SKYLAKEX and works correctly using the TARGET=HASWELL library.
Other flags used were CC=gcc-6, FC=gfortran-6, and USE_OPENMP=1. I had to add -march=skylake-avx512 to COMMON_OPT on SKYLAKEX build to get AVX512.
The results for HASWELL were:
The results for SKYLAKEX were:
This indicates that the Cholesky factorization failed at A(33,33). This particular matrix is known to be positive definite and the Cholesky factorization proceeds without error using MKL and the reference BLAS/LAPACK. Of course it also works correctly on the development version of OpenBLAS with TARGET=HASWELL, so this would seem to indicate a bug in the AVX512 support for SKYLAKEX.
testdpotrf.zip
The text was updated successfully, but these errors were encountered: