Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault when use cblas_sgemm. Occasionally, this core appears, and most of the time it works normally. #3788

Closed
johnson-xu01 opened this issue Oct 15, 2022 · 10 comments

Comments

@johnson-xu01
Copy link

Program terminated with signal 11, Segmentation fault.
#0 0x0000000002e4297d in sgemm_incopy ()
#1 0x0000000002e3bc66 in sgemm_tn ()
#2 0x0000000002e3afaa in cblas_sgemm ()

@martin-frbg
Copy link
Collaborator

This could be related to the thread safety issue just reported as #3787 - but hard to tell without more information. What is your hardware, and which version of OpenBLAS please ?

@johnson-xu01
Copy link
Author

This could be related to the thread safety issue just reported as #3787 - but hard to tell without more information. What is your hardware, and which version of OpenBLAS please ?

thank @martin-frbg . I just want to run normally and ignore the occasional error. add try catch for cblas_sgemm, does it work?

@martin-frbg
Copy link
Collaborator

I really cannot answer this with the little information you gave. aybe you are using an outdated version of OpenBLAS with an error that has long since been fixed; or maybe you can get around the problem by compiling OpenBLAS with USE_SIMPLE_THREADED_LEVEL3=1

@johnson-xu01
Copy link
Author

I really cannot answer this with the little information you gave. aybe you are using an outdated version of OpenBLAS with an error that has long since been fixed; or maybe you can get around the problem by compiling OpenBLAS with USE_SIMPLE_THREADED_LEVEL3=1

@martin-frbg OpenBlas version 0.3.20

@johnson-xu01
Copy link
Author

I really cannot answer this with the little information you gave. aybe you are using an outdated version of OpenBLAS with an error that has long since been fixed; or maybe you can get around the problem by compiling OpenBLAS with USE_SIMPLE_THREADED_LEVEL3=1

really thank you @martin-frbg . I use version 0.3.20, does 0.3.21 have fix this problem?

@brada4
Copy link
Contributor

brada4 commented Oct 15, 2022

First to try is threads vs no-threads setting OPENBLAS_NUM_THREADS=1 before running sample - if that fails sporadicaly too then it is certain problem is threading.
Could be related to vectorizer dilemma too in case threads are cleared.

@martin-frbg
Copy link
Collaborator

Does not look like anything fixed between 0.3.20 and 0.3.21 unfortunately. What is your hardware, x86_64 or something like arm64 ?

@johnson-xu01
Copy link
Author

Does not look like anything fixed between 0.3.20 and 0.3.21 unfortunately. What is your hardware, x86_64 or something like arm64 ?

maybe AMD EPYC ROME

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 16, 2022

So a big system - how big is your matrix size in the SGEMM ? (Could be you sometimes exceed the size of the internal buffer, a static array used to communicate subsets of the data between threads, see BUFFERSIZE setting in Makefile.rule)

@martin-frbg
Copy link
Collaborator

closing as there was not much to go on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants