-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmentation fault when use cblas_sgemm. Occasionally, this core appears, and most of the time it works normally. #3788
Comments
This could be related to the thread safety issue just reported as #3787 - but hard to tell without more information. What is your hardware, and which version of OpenBLAS please ? |
thank @martin-frbg . I just want to run normally and ignore the occasional error. add try catch for cblas_sgemm, does it work? |
I really cannot answer this with the little information you gave. aybe you are using an outdated version of OpenBLAS with an error that has long since been fixed; or maybe you can get around the problem by compiling OpenBLAS with USE_SIMPLE_THREADED_LEVEL3=1 |
@martin-frbg OpenBlas version 0.3.20 |
really thank you @martin-frbg . I use version 0.3.20, does 0.3.21 have fix this problem? |
First to try is threads vs no-threads setting OPENBLAS_NUM_THREADS=1 before running sample - if that fails sporadicaly too then it is certain problem is threading. |
Does not look like anything fixed between 0.3.20 and 0.3.21 unfortunately. What is your hardware, x86_64 or something like arm64 ? |
maybe AMD EPYC ROME |
So a big system - how big is your matrix size in the SGEMM ? (Could be you sometimes exceed the size of the internal buffer, a static array used to communicate subsets of the data between threads, see BUFFERSIZE setting in Makefile.rule) |
closing as there was not much to go on |
Program terminated with signal 11, Segmentation fault.
#0 0x0000000002e4297d in sgemm_incopy ()
#1 0x0000000002e3bc66 in sgemm_tn ()
#2 0x0000000002e3afaa in cblas_sgemm ()
The text was updated successfully, but these errors were encountered: