Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in DPOTRF() in SKYLAKEX AVX512 build but works fine in HASWELL build. #1643

Closed
brianborchers opened this issue Jun 26, 2018 · 25 comments

Comments

@brianborchers
Copy link

I've built the development version (downloaded June 25) for TARGET=HASWELL and TARGET=SKYLAKEX. Unfortunately, the attached test program fails using the version of the library built with TARGET=SKYLAKEX and works correctly using the TARGET=HASWELL library.

Other flags used were CC=gcc-6, FC=gfortran-6, and USE_OPENMP=1. I had to add -march=skylake-avx512 to COMMON_OPT on SKYLAKEX build to get AVX512.

The results for HASWELL were:

./testdpotrf 10000
10000 2.07 161.1 6.74e-13

The results for SKYLAKEX were:

./testdpotrf 10000
dpotrf info=33

This indicates that the Cholesky factorization failed at A(33,33). This particular matrix is known to be positive definite and the Cholesky factorization proceeds without error using MKL and the reference BLAS/LAPACK. Of course it also works correctly on the development version of OpenBLAS with TARGET=HASWELL, so this would seem to indicate a bug in the AVX512 support for SKYLAKEX.

testdpotrf.zip

@martin-frbg
Copy link
Collaborator

AVX512 support is just SGEMM/DGEMM right now. (Wonder if the corresponding spotrf testcase would succeed, as DGEMM was an add-on to the original PR - unfortunately I do not have the hardware)
@fenrus75 ?

@fenrus75
Copy link
Contributor

I'll try to take a look.

it's... curious how this can happen; if dgemm was completely broken a lot more things would be failing

@brianborchers
Copy link
Author

The testspotrf test case works correctly with this version of the library, so the problem is specific to double precision.

@brianborchers
Copy link
Author

brianborchers commented Jun 26, 2018

I've rebuilt the libraries and tried to make everything as identical as possible except for the TARGET. The gcc-6 compiler used here is:

gcc version 6.4.0 20180424 (Ubuntu 6.4.0-17ubuntu1~16.04)

  1. make CC=gcc-6 FC=gfortran-6 USE_OPENMP=1 TARGET=HASWELL

The build completed with reported test failures. Both testspotrf and testdpotrf produce correct results.

  1. make CC=gcc-6 FC=gfortran-6 USE_OPENMP=1 TARGET=SKYLAKEX

All files were compiled with march=skylake-avx512. I did not have to modify COMMON_OPT

Several tests failed:

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1      0.262339         -0.849231E-01
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* DSYMM  FAILED ON CALL NUMBER:
    382: DSYMM ('R','U',  1,  7, 1.0, A,  8, B,  2, 0.0, C,  2)    .

 DTRMM  PASSED THE TESTS OF ERROR-EXITS

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1      0.485095          0.186814    
      THESE ARE THE RESULTS FOR COLUMN   2
 ******* DTRMM  FAILED ON CALL NUMBER:
    758: DTRMM ('R','U','N','U',  1,  7, 1.0, A,  8, B,  2)        .

 DTRSM  PASSED THE TESTS OF ERROR-EXITS

 ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
           EXPECTED RESULT   COMPUTED RESULT
       1      0.186813         -0.166775E-01
      THESE ARE THE RESULTS FOR COLUMN   1
 ******* DTRSM  FAILED ON CALL NUMBER:
    764: DTRSM ('R','U','T','U',  1,  7, 1.0, A,  8, B,  2)        .

With this version of the library, testspotrf produced correct results, but testdpotrf produced incorrect results as described before.

./testdpotrf 10000
dpotrf info=33

Comments:

  1. I suppose that there could be a compiler bug that shows up only with -march=skylake-avx512.

  2. The only calls to BLAS/LAPACK in the test program are to dgemv, dpotrf, dpotrs, and dnrm2.

I've attached the test programs and output from the SKYAKEX build
testproblems.zip

Please let me know if there's anything further that I can do to help isolate this bug.

@brianborchers
Copy link
Author

Setting OMP_NUM_THREADS leads to a slightly different failure:

./testdpotrf 10000
dpotrf info=19

@fenrus75
Copy link
Contributor

fenrus75 commented Jun 26, 2018 via email

@brianborchers
Copy link
Author

I noticed that DPOTRF calls DTRSM, which is one of the routines that failed its tests during the build. Given that DTRSM is failing its test at build time, this should probably be the priority in debugging the problem. I'd just assume that DPOTRF is failing because DTRSM is broken.

Haven't there been other reported problems with DTRSM recently?

This could be a bug with gcc-6 in combination with -march-skylake-avx512. Perhaps applying the march=skylake-avx512 flag only to routines needed by dgemm/sgemm would resolve the problem in dtrsm and point to a compiler bug.

@brianborchers
Copy link
Author

trsm.c calls gemm_thread_m and gemm_thread_n, so there could well be an issue with dgemm that is breaking dtrsm and then dpotrf.

@fenrus75
Copy link
Contributor

fenrus75 commented Jun 27, 2018 via email

@brianborchers
Copy link
Author

OK- let me know if you'd like me to run a test with this.

@martin-frbg
Copy link
Collaborator

Not all of the pre-existing kernel files may be usable without prior inspection unfortunately, if nothing uses them currently they may have been retired due to unexplained errors but never removed. Others including the generic trmmkernel_16x2 have no apparent usage history at all.

I have now tried a Haswell build with the kernels used on Skylake (obviously using the pre-existing 16x2 dgemm kernel rather than the new avx512) and see DSYMM,DTRMM and DTRSM failing as well.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 27, 2018

It appears to be the dgemmkernel_16x2 itself that causes the failures. Looking through the git history I am not convinced it was ever used, although there was some work done on it in 2013. (Checking now if it is just the GEMM_DEFAULT_UNROLL_ parameters in param.h that must be adjusted to go with the change in kernel - no, this does not help)

@fenrus75
Copy link
Contributor

fenrus75 commented Jun 27, 2018 via email

@martin-frbg
Copy link
Collaborator

Tried an earler version of dgemm_16x2_haswell (the one labeled "corrected and tested" that preceded the "optimized" commit), but that even fails on DGEMM itself (and DSYRK, while interestingly DTRMM passes)

@fenrus75
Copy link
Contributor

fenrus75 commented Jun 27, 2018 via email

@martin-frbg
Copy link
Collaborator

I have disabled that DGEMM kernel for now (after fumbling aimlessly both with the 16x2 haswell kernel and my attempt of adapting its 8x2 piledriver counterpart) and moved the milestones on the two related tickets to 0.3.2. This will allow 0.3.1 to still get the initial SkylakeX support with the AVX512 SGEMM kernel

@brianborchers
Copy link
Author

brianborchers commented Jun 30, 2018

The latest development version still fails on one of the tests at build time. With

make TARGET=SKYLAKEX CC=gcc-6 FC=gfortran-6 USE_OPENMP=1

I get:

DTRMM PASSED THE TESTS OF ERROR-EXITS

******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 0.485095 0.186814
THESE ARE THE RESULTS FOR COLUMN 2
******* DTRMM FAILED ON CALL NUMBER:
758: DTRMM ('R','U','N','U', 1, 7, 1.0, A, 8, B, 2) .

This is exactly the same failure that we were seeing before the DGEMM kernel was removed.

However, testdpotrf does work as expected (slow, but correct.)

@martin-frbg
Copy link
Collaborator

Sorry - I had not disabled the 16x2 TRMMKERNEL as well with the latest PR, thinking it was harmless. Thanks for catching this, I will create a new PR in a minute.

@brianborchers
Copy link
Author

Now the SKYLAKEX build works without errors. The SKYLAKEX version is about 10% faster on SGEMM (360 gigflops vs. 330) than the HASWELL build. So, no significant performance improvement yet, but everything does function.

@fenrus75
Copy link
Contributor

fenrus75 commented Jun 30, 2018 via email

@martin-frbg
Copy link
Collaborator

Still I guess it would be labeled significant if we ever saw a ten percent performance loss somewhere...

@fenrus75
Copy link
Contributor

fenrus75 commented Jun 30, 2018 via email

@Diazonium
Copy link
Contributor

FYI, some SKL-X CPUs come with 1 AVX-512 execution unit per core (under 10 cores, so i7 7820X and under), but the 10 core (i9 7900X) and higher core count CPUs have 2 AVX-512 units per core
not sure if this matters, just thought to put it down here, because it is not widely known

@martin-frbg
Copy link
Collaborator

Somehow I suspect fenrus75 knows this 😄

@martin-frbg
Copy link
Collaborator

Assuming fixed by the new DGEMM kernel now in 0.3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants