-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R result error when with skylake-avx512 build #2168
Comments
R is single-threaded multi-processing thing, you can just type "make" in openblas build to get native pthread version, optimal as Rblas.so replacement. Though I am almost confident it is compiler problem -c heck #2154 for compiler flags to work with GCC9 - can you checl if older release of your system with not that radical compiler version does work. Can you do some more tests: |
@martin-frbg R is using only double precision, the default Rblas(.so|.dll) does not include single precision codes. |
@brada4 Build with 'Target=Haswell' does not report this issue. And, regarding your comment to martin's suggestion, a bit confused that whether martin's suspect is still valid? |
Depends on how often you reinstall your R If you want to keep R for year(s) you might try to build reduced version without risky codes: It might help if you share benchmarks between this and Microsoft R (that includes MKL), so we understand better how much worse is haswell code on skyx |
So you mean AVX512 version of openblas is not quite stable as of today, and recommend to use Target=Haswell ? |
The AVX512 support in OpenBLAS is practically just SGEMM and DGEMM. There are no known problems with the SGEMM, but the DGEMM code is now known to produce wrong results. Unfortunately I had disabled only parts of it in 0.3.6 which still allowed errors for some inputs. (Sadly I have failed to find the errors in the original code contribution, and its author appears to have been busy elsewhere during the last three or four months.) |
In R context only DGEMM matters, so yup, it is not stable, though there is a good hope it is disabled well enough |
I can confirm that this bug in OpenBLAS on AVX-512 is "in the wild" with practical implications - I just spent a great deal of time trying to figure out why a particular colorspace conversion was failing, and it turns out to have been due to this. The following R code gives ridiculous results (different values for the same color) on an Intel i9-7900X CPU running Fedora FC30 workstation with R 3.6.0 and openblas 0.3.6-2.fc30 converts red from RGB to Lab colorspacewhite <- c(x = 0.953205712549377, 1, y = 1.08538438164692) but correct answers if R is started with AVX-512 disabled: env OPENBLAS_CORETYPE=Haswell R --vanilla |
You can compile with DYNAMIC_ARCH=1 NO_AVX512=1 to avoid any suspect strains. |
Fixed now by wjc404's new AVX512 DGEMM kernel from #2286. |
With 0.3.6 build targeting skylake-avx512, one of Rbenchmark will fail as result wrong (detail attached below). This remains the same under both USE_OPENMP=1 or 0. The openblas is built with GCC v9.1.1. R version is 3.6.0.
The R test code is attached:
R-benchmark-25_issue_test.R.txt
Output of the test is as below:
Loading required package: Matrix
Error in chol.default(a) :
the leading minor of order 2 is not positive definite
Calls: run -> system.time -> chol -> chol -> chol.default
Timing stopped at: 0.014 0.017 0.031
Execution halted
The text was updated successfully, but these errors were encountered: