Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing for POWER8 on big-endian #2299

Closed
pkubaj opened this issue Oct 30, 2019 · 70 comments
Closed

Optimizing for POWER8 on big-endian #2299

pkubaj opened this issue Oct 30, 2019 · 70 comments
Milestone

Comments

@pkubaj
Copy link
Contributor

pkubaj commented Oct 30, 2019

We're preparing an update of OpenBLAS port in FreeBSD (to 0.3.7) and I'm doing tests on POWER. Previously, we optimized for PPC970 with an option to optimize for POWER6. Now I also tested newer CPU's (they didn't work before). POWER9 still fails, but it looks like OpenBLAS built for POWER8 passes all tests. This is all done on big-endian.

Can I conclude that it's safe to optimize for POWER8 on big-endian variant?

@quickwritereader
Copy link
Contributor

quickwritereader commented Oct 31, 2019

From my latest tests Power9 passes blas-tester for little endian. Just I should lookat lapack failures.
I think its good idea to start with power8 blas 1/2 which is the same on power9

@pkubaj
Copy link
Contributor Author

pkubaj commented Oct 31, 2019

POWER9 probably passes on LE, since this is where most attention is at, but FreeBSD is currently BE-only. If POWER9 works as well, even better, but it currently fails.

@martin-frbg
Copy link
Collaborator

pkubaj where do you get failures with big-endian POWER9 ? There were some recent (post-0.3.7) fixes (#2233, #2263,#2269) for getting the POWER9 code compiled correctly with both gcc 9.2 and various older versions of gcc and binutils but I am not sure if anyone looked for potential issues with endianness yet.

@pkubaj
Copy link
Contributor Author

pkubaj commented Oct 31, 2019

This test fails when optimizing for POWER9 on BE:

TEST 21/23 potrf:smoketest_trivial [FAIL]
  ERR: test_potrs.c:513  L s(0,0) difference: 1.7

Other than this one, there may be other failing later.

@martin-frbg
Copy link
Collaborator

I suspect this may actually be SGEMM failing (test calls SPORTF followed by SGEMM then compares against initial input), there will probably be corresponding failures in the lapack-derived test/ctest as well. Unfortunately my current OpenPower host does not appear to provide big-endian POWER9, @quickwritereader does yours ?

@pkubaj
Copy link
Contributor Author

pkubaj commented Oct 31, 2019

I suspect this may actually be SGEMM failing (test calls SPORTF followed by SGEMM then compares against initial input), there will probably be corresponding failures in the lapack-derived test/ctest as well. Unfortunately my current OpenPower host does not appear to provide big-endian POWER9, @quickwritereader does yours ?

If you have a bare-metal host, you can create VM's in any endianness on POWER. So you can create a BE VM on LE host.

If that's not possible, I can give you SSH access to my host (just note that it runs FreeBSD).

@quickwritereader
Copy link
Contributor

I did not consider endiannes as my goal was little endian. For that one should change swap masks and and also check save macroses. Eapecially with single precisions.

@quickwritereader
Copy link
Contributor

@martin-frbg I think we can request big endian one. But the current one is a little endian

@martin-frbg
Copy link
Collaborator

Right now I see my recent fixes with the gcc7-generated assembly backfiring on power8be - at the very least the assembler is not going to accept the .localentry and .size statements generated by the little-endian compiler...

@pkubaj
Copy link
Contributor Author

pkubaj commented Oct 31, 2019

I think you don't actually mean endianess here, but ABI version (ELFv2). This concept is often mixed even by IBM.
They seem to consider big-endian only ELFv1 and little-endian only ELFv2. In fact this is only an ABI and nothing prevents you from creating big-endian system with ABI ELFv2 or little-endian ELFv1.

Creating little-endian OS with ELFv1 is pointless nowadays, but OpenSUSE had a LE ELFv1 variant a few years ago, when ELFv2 was only starting.

However, there are actively developed systems that are BE for POWER and use ELFv2 ABI, e.g. Adelie Linux (BE exclusively on PPC) or Void Linux (it has variants for both LE and BE, but all are ELFv2).
FreeBSD is currently ELFv1, but it's switching to ELFv2 soon and it will stay on BE. I tested both 0.3.7 and git from develop branch optimized for POWER8 and they seem to build and pass tests just fine on my experimental system with ELFv2.

After your message, I also tested develop on ELFv1 system and it fails to build, but 0.3.7 builds there just fine (for POWER8, I use GCC 9.2), but fails with tests:

TEST 23/23 kernel_regress:skx_avx [FAIL]
  ERR: test_kernel_regress.c:49  expected 0.000e+00, got 6.694e+02 (diff -6.694e+02, tol 1.000e-10)

If there are no plans to fix it I think we'll just offer an option to optimize for POWER8 only on ELFv2.

I also think that you shouldn't "fix" compiler problems with such workarounds. OpenBLAS users probably use the latest compilers anyway to squeeze more performance and GCC 9 works.

@pkubaj
Copy link
Contributor Author

pkubaj commented Oct 31, 2019

What I wrote may be a little unclear, so a short summary (all variants are BE):
ELFv1, 0.3.7: works with POWER6, doesn't work with POWER8 (test failure), POWER9,
ELFv1: develop: works with POWER6, doesn't work with POWER8 (assembly failure you highligted) and POWER9,
ELFv2, 0.3.7: works with POWER6, POWER8, doesn't work with POWER9,
ELFv2, develop: works with POWER6, POWER8, doesn't work with POWER9,

So POWER6 works everywhere, POWER9 doesn't work everywhere, POWER8 works on ELFv2.

@martin-frbg
Copy link
Collaborator

Thanks for the explanation - ELFv1 it is then (and I am checking if the problem can be papered over with a few #if _CALL_ELF ==2). Not that I was confusing concepts here, I simply did not know any better - my only previous encounter with PPC was with the venerable RS/6000 "some" years ago.

@pkubaj
Copy link
Contributor Author

pkubaj commented Oct 31, 2019

AFAIK the proper check is #if defined(_CALL_ELF) && (_CALL_ELF == 2), because a compiler might only support ELFv1 (and not define _CALL_ELF, because when this compiler was released, there was no ELFv2).

@martin-frbg
Copy link
Collaborator

martin-frbg commented Nov 1, 2019

Right, but I wanted to try first if my trivial fix was sufficient - unfortunately it is not (segfaults in all the min/max/axpy functions that I tampered with to work around miscompilation by older gcc). Guess I will either have to try and merge the differences seen in the respective gcc-generated assembly file for big-endian, or revert my kludgy fix for #2254 before a 0.3.8 release.
Unfortunately - and this is relevant to 0.3.7 as well - running make lapack-test on big-endian POWER8 shows 1199 cases of numerical error for double precision real and 526 for double-precision complex (in addition to the "familiar" one error each in single precision real&complex that is most likely spurious). This is with an OpenSUSE "tumbleweed" rolling release containing gcc 9.2.1, glibc 2.30.
@pkubaj have you checked the results of make lapack-test (i.e. running the testsuite from the netlib reference version of LAPACK) ? (The tests for complex try to put a huge array of test data on the stack, so you may need to increase the default stack size limit before running them)

@pkubaj
Copy link
Contributor Author

pkubaj commented Nov 1, 2019

I have checked it now:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1284869         1       (0.000%)        5       (0.000%)
DOUBLE PRECISION        1088441         1182    (0.109%)        5       (0.000%)
COMPLEX                 751052          1       (0.000%)        10      (0.001%)
COMPLEX16               554591          558     (0.101%)        15      (0.003%)

--> ALL PRECISIONS      3678953         1742    (0.047%)        35      (0.001%)

I also did those tests for POWER6:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1284869         1       (0.000%)        1       (0.000%)
DOUBLE PRECISION        1293457         0       (0.000%)        1       (0.000%)
COMPLEX                 741512          7       (0.001%)        2       (0.000%)
COMPLEX16               748336          4       (0.001%)        2       (0.000%)

--> ALL PRECISIONS      4068174         12      (0.000%)        6       (0.000%)

So yeah, I think we'll just stay with PPC970 and option for POWER6.

Thanks for help!

@pkubaj pkubaj closed this as completed Nov 1, 2019
@martin-frbg
Copy link
Collaborator

BLAS-Tester suggests the majority of the POWER8BE problems may actually be with DAMIN/DAMAX rather than the less readable (and old) DGEMM/ZGEMM kernels.

@quickwritereader
Copy link
Contributor

I could join on handling big endiannes case.
For power8 its mostly c vectorial code and should not be hard to convert. For power 9 gemm related kernels we should look at permute masks and save makrokernels besides ABI related things.
But I think we should firstly decide how much it is important.
Because IBM team only asked for little endiannes for their linux cloud. Maybe big endian systems would be obsolete in near future.

@pkubaj
Copy link
Contributor Author

pkubaj commented Nov 1, 2019

They certainly don't deliver a consistent message. On the one hand, they are truly saying that BE is obsolete and everyone should migrate to LE. On the other hand, I read some announcements from them that they are committed to supporting both BE and LE both in Linux (there are no plans to remove ppc64 from Linux) and in i/AIX.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Nov 2, 2019

#1997 has/had someone from the big-endian branch of IBM trying
to adapt our older power8 code but it appeared to be a decidedly non-trivial task (though the main problem there may be lack of support for recent ppc instructions in the native AIX toolchain)

@martin-frbg
Copy link
Collaborator

BLAS-Tester suggests the majority of the POWER8BE problems may actually be with DAMIN/DAMAX rather than the less readable (and old) DGEMM/ZGEMM kernels.

IDAMIN/MAX and IZAMIN/MAX to be precise - replacing them with their generic equivalents from ../arm makes the lapack tests pass again with the "usual" error count of 1/0/1/0.

@pkubaj
Copy link
Contributor Author

pkubaj commented Feb 10, 2020

Since 0.3.8 supposedly has those errors fixed, I tried it again:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        4       (0.000%)
DOUBLE PRECISION        1096391         1182    (0.108%)        4       (0.000%)
COMPLEX                 766602          1       (0.000%)        8       (0.001%)
COMPLEX16               562541          558     (0.099%)        13      (0.002%)

--> ALL PRECISIONS      3725953         1742    (0.047%)        29      (0.001%)

This is OpenBLAS 0.3.8 on FreeBSD on ELFv2, optimizing for POWER8. Should I assume that the test cases that fail are actually wrong and it's ok to offer OpenBLAS builds for POWER8 BE now?

@martin-frbg
Copy link
Collaborator

No, that does not look good - what compiler version is this ?

@pkubaj
Copy link
Contributor Author

pkubaj commented Feb 10, 2020

GCC 9.2.

@pkubaj
Copy link
Contributor Author

pkubaj commented Feb 10, 2020

For the record, this is POWER issue, not a general FreeBSD issue since amd64 seems ok:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1309007         0       (0.000%)        0       (0.000%)
COMPLEX                 760590          1       (0.000%)        0       (0.000%)
COMPLEX16               769178          0       (0.000%)        0       (0.000%)

--> ALL PRECISIONS      4139194         2       (0.000%)        0       (0.000%)

@pkubaj
Copy link
Contributor Author

pkubaj commented Feb 10, 2020

Also, the errors seem to be caused by ELF standard. 0.3.8/POWER8/ELFv1:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        4       (0.000%)
DOUBLE PRECISION        1309007         0       (0.000%)        4       (0.000%)
COMPLEX                 757350          1       (0.000%)        8108    (1.071%)
COMPLEX16               769178          0       (0.000%)        8       (0.001%)

--> ALL PRECISIONS      4135954         2       (0.000%)        8124    (0.196%)

8100 of those COMPLEX fails are in Mixed-Precision-linear-equation-routines-LIN/xlintstzc.

@pkubaj
Copy link
Contributor Author

pkubaj commented Jun 15, 2020

Just did a test on 0.3.10 for POWER8 on ELFv2:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        4       (0.000%)
DOUBLE PRECISION        1309007         0       (0.000%)        4       (0.000%)
COMPLEX                 766602          1       (0.000%)        8       (0.001%)
COMPLEX16               769178          0       (0.000%)        8       (0.001%)

--> ALL PRECISIONS      4145206         2       (0.000%)        24      (0.001%)

So there are 24 other errors. Did you get the same results?

@martin-frbg
Copy link
Collaborator

Don't remember seeing this, but will need to retest. ("Other error" usually is "on entry to XX parameter YY had an invalid value" so not a good thing to see)

@pkubaj
Copy link
Contributor Author

pkubaj commented Jun 15, 2020

BTW, same happens when trying POWER9, so it's possible that this is a bug in some shared code and once it's fixed, POWER9 kernels will also work on BE.

@martin-frbg
Copy link
Collaborator

POWER9BE gets mapped to POWER8BE currently, as the current POWER9 kernels were written for LE only

@martin-frbg martin-frbg reopened this Jun 15, 2020
@martin-frbg
Copy link
Collaborator

Curiously I am getting the same numbers in the "nb tests run" (and "numerical error") columns as you, but only zeroes for "other error" (and no hint of anything else wrong in testing-results.txt either).

@h-vetinari
Copy link
Contributor

This issue is now part of a closed milestone; should probably be shifted to 0.3.11?

@martin-frbg
Copy link
Collaborator

Think actual breakage for which it was created is fixed, and I failed to reproduce the "other errors" pkubaj saw (possibly related to compiler version). So more inclined to close it again unless there is new input.

@pkubaj
Copy link
Contributor Author

pkubaj commented Jun 25, 2020

testing_results.txt
Here are my test results, hoping that would help.

@pkubaj
Copy link
Contributor Author

pkubaj commented Jun 25, 2020

pkubaj@talos:$~$ grep -B 3 failed testing_results.txt

 *** XERBLA was called from SSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from SSYEVR_2STAGE with INFO =     18 instead of 20 ***
 *** SST routines failed the tests of the error exits ***
--

 *** XERBLA was called from SSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from SSYEVR_2STAGE with INFO =     18 instead of 20 ***
 *** SST routines failed the tests of the error exits ***
--

  *** Error(s) or Failure(s) while testing STFSM               ***
      Failure in STFSM, CFORM='T', SIDE='L', UPLO='U', TRANS='T', DIAG='N', M= 50, N = 50, test=  36.119
  STFSM auxiliary routine:     1 out of  7776 tests failed to pass the threshold
--

 *** XERBLA was called from DSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from DSYEVR_2STAGE with INFO =     18 instead of 20 ***
 *** DST routines failed the tests of the error exits ***
--

 *** XERBLA was called from DSYEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from DSYEVR_2STAGE with INFO =     18 instead of 20 ***
 *** DST routines failed the tests of the error exits ***
--
  9 = | TR - RW | / ( |T| |R| ulp )      10 = | LT - WL | / ( |T| |L| ulp )
 11= |HX - XW| / (|H| |X| ulp)  (inv.it) 12= |YH - WY| / (|H| |Y| ulp)  (inv.it)
 Matrix order=   16, type=12, seed=2005,2639,2925,3777, result   8 is   28.74
 CHS:    1 out of  1764 tests failed to pass the threshold
--
 *** XERBLA was called from CHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from CHEEVR_2STAGE with INFO =     18 instead of 20 ***
 *** XERBLA was called from CHEEVR_2STAGE with INFO =     18 instead of 22 ***
 *** CST routines failed the tests of the error exits ***
--
 *** XERBLA was called from CHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from CHEEVR_2STAGE with INFO =     18 instead of 20 ***
 *** XERBLA was called from CHEEVR_2STAGE with INFO =     18 instead of 22 ***
 *** CST routines failed the tests of the error exits ***
--

 *** XERBLA was called from CHBEVD_2STAGE with INFO =     11 instead of 13 ***
 *** XERBLA was called from CHBEVD_2STAGE with INFO =     11 instead of 15 ***
 *** CHB routines failed the tests of the error exits ***
--
 *** XERBLA was called from ZHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from ZHEEVR_2STAGE with INFO =     18 instead of 20 ***
 *** XERBLA was called from ZHEEVR_2STAGE with INFO =     18 instead of 22 ***
 *** ZST routines failed the tests of the error exits ***
--
 *** XERBLA was called from ZHEEVD_2STAGE with INFO =      8 instead of 10 ***
 *** XERBLA was called from ZHEEVR_2STAGE with INFO =     18 instead of 20 ***
 *** XERBLA was called from ZHEEVR_2STAGE with INFO =     18 instead of 22 ***
 *** ZST routines failed the tests of the error exits ***
--

 *** XERBLA was called from ZHBEVD_2STAGE with INFO =     11 instead of 13 ***
 *** XERBLA was called from ZHBEVD_2STAGE with INFO =     11 instead of 15 ***
 *** ZHB routines failed the tests of the error exits ***

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 25, 2020

Seems to be tests for various error exits in some of the newer "2stage" algorithms that are failing (by setting a wrong error code)
EDIT: messages crossed...
Is that still with gcc 9.2 as mentioned earlier ?

@martin-frbg
Copy link
Collaborator

The code returned by xerbla in these cases (well, I only looked up zhbevd_2stage right now) is the number of off-diagonal matrx elements that did not converge to zero, so apparently we are just doing a little better than expected with some iterative algorithm. (Possibly accuracy issue due to FMA operations ?)

@pkubaj
Copy link
Contributor Author

pkubaj commented Jun 26, 2020

I'm on GCC 9.3.0. Should I test 10.1.0?

@martin-frbg
Copy link
Collaborator

Trying to install that right now - I just double-checked that my freebsd vm in the unicamp minicloud has 9.2.0 so that could explain differences in our results. (I am still not convinced that these are actually harmful)

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 26, 2020

10.1.0 gets me 8100 "other errors" in COMPLEX and around 20000 numerical errors in COMPLEX16 at our default optimization levels. Lets see if dropping gfortran optimization to O1 or O0 helps... I hate moving targets :(
Update: gfortran -O1 indeed reduces the "other errors" in COMPLEX to zero (though leaving 4 numeric errors there) but does nothing for/against the numeric errors in COMPLEX16
Update2: going from -Ofast to -O1 for gcc removes the COMPLEX16 failures. Now investigating which file could use another #pragma GCC optimize "O1"

@martin-frbg martin-frbg modified the milestones: 0.3.10, 0.3.11 Jun 27, 2020
@martin-frbg
Copy link
Collaborator

Instead of reducing gcc optimization to O1 globally, it is sufficient to add a pragma to caxpy.c and disable the assembly microkernel in zdot.c.

@pkubaj
Copy link
Contributor Author

pkubaj commented Aug 23, 2020

Sorry for the long lag, I was busy with other things.

What do you mean by disabling assembly microkernel in zdot.c? https://github.com/xianyi/OpenBLAS/blob/develop/kernel/power/zdot.c has no assembly and https://github.com/xianyi/OpenBLAS/blob/develop/kernel/power/KERNEL.POWER8 doesn't list zdot.S.

@martin-frbg
Copy link
Collaborator

zdot.c starts with an ifdef POWER8 (etc). ... include zdot_microk_power8.c - and that microk(ernel) file is where the inline assembly resides. BTW this is a pretty common motif in the kernels written by Werner Saar.

@pkubaj
Copy link
Contributor Author

pkubaj commented Aug 23, 2020

BTW, regarding those XERBLA test errors. By "apparently we are just doing a little better than expected with some iterative algorithm", do you mean those errors are safe to ignore? It would be nice to get this issue closed :)

@martin-frbg
Copy link
Collaborator

I am guessing that they are safe to ignore, unfortunately there are no clear guidelines for how to assess testsuite failures (and the testsuite is not free from bugs itself). And if what I wrote above is still correct, reducing the gcc optimization level for caxpy and disabling the zdot microkernel took care of these anyway. Unfortunately my minicloud account seems to have expired in the meantime, and the gcc compile farm does not offer this combination of HW and OS, so it may take me a few days to get back to this. If you can test, adding

#if defined(__FreeBSD__)
#if defined(_CALL_ELF) && (_CALL_ELF == 2)
#pragma GCC optimize "O1"
#endif
#endif

at the start of caxpy.c and adding something like
#if _CALL_ELF != 2 or #if !defined(__FreeBSD__) to the mix of ifdefs around the microkernel include in zdot.c was what I had in mind.

@pkubaj
Copy link
Contributor Author

pkubaj commented Aug 25, 2020

I patched those files:

--- kernel/power/caxpy.c.orig   2020-06-14 20:03:04 UTC
+++ kernel/power/caxpy.c
@@ -24,6 +24,13 @@ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONT
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
 USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *****************************************************************************/
+
+#if defined(__FreeBSD__)
+#if defined(_CALL_ELF) && (_CALL_ELF == 2)
+#pragma GCC optimize "O1"
+#endif
+#endif
+
 #include "common.h"
 #ifndef HAVE_ASM_KERNEL
 #include <altivec.h>
--- kernel/power/zdot.c.orig    2020-08-24 14:36:15 UTC
+++ kernel/power/zdot.c
@@ -36,7 +36,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILI
 #include "common.h"


-#if defined(POWER8) || defined(POWER9)
+#if (defined(POWER8) || defined(POWER9)) && !defined(__FreeBSD__)
 #include "zdot_microk_power8.c"
 #endif

But still experience the same issue (that's on ELFv2):

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        4       (0.000%)
DOUBLE PRECISION        1309007         0       (0.000%)        4       (0.000%)
COMPLEX                 768366          0       (0.000%)        8       (0.001%)
COMPLEX16               769178          0       (0.000%)        8       (0.001%)

--> ALL PRECISIONS      4146970         1       (0.000%)        24      (0.001%)

Since those failures are probably not an issue, I guess it's ok to close this.

@pkubaj pkubaj closed this as completed Aug 25, 2020
@martin-frbg
Copy link
Collaborator

Which gcc ? ISTR that patch was needed to work around a ton of "other error"s seen with 10.1

@pkubaj
Copy link
Contributor Author

pkubaj commented Aug 25, 2020

Ah, ok. I was still on 9.3.0. I'll test with 10.1.0.

@pkubaj
Copy link
Contributor Author

pkubaj commented Aug 26, 2020

Hm, same issue after applying that patch:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        4       (0.000%)
DOUBLE PRECISION        1309007         0       (0.000%)        4       (0.000%)
COMPLEX                 768366          0       (0.000%)        8       (0.001%)
COMPLEX16               769178          0       (0.000%)        8       (0.001%)

--> ALL PRECISIONS      4146970         1       (0.000%)        24      (0.001%)

@martin-frbg
Copy link
Collaborator

Probably. What I meant was that it looked a lot worse with 10.1 before that patch.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Sep 3, 2020

Regained minicloud access, the result I get for the current develop branch (i.e. without patching caxpy and zdot) on POWER8 with FreeBSD13-CURRENT and now gcc-10.2 is

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1300419         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1309007         0       (0.000%)        0       (0.000%)
COMPLEX                 766602          1       (0.000%)        0       (0.000%)
COMPLEX16               769178          0       (0.000%)        0       (0.000%)

--> ALL PRECISIONS      4145206         2       (0.000%)        0       (0.000%)

(so interestingly no "other errors" counted but still about 1700 fewer COMPLEX tests completed compared to your latest results - incidentally exactly the same number as you saw on Jun 15 with gcc9.3 - and later with 10.1)
Adding the pragma GCC optimize "O1" to caxpy.c brought the COMPLEX count back to 768366 and removed the single numerical error, still without "other" errors there. (The zdot part of the patch had zero impact with gcc-10.2)

@pkubaj
Copy link
Contributor Author

pkubaj commented Sep 4, 2020

Thanks, I guess it's something about my environment then. I'll soon post an update to 0.3.10 for FreeBSD and add an option to use POWER8 kernels. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants