-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dscal is slow with multiple threads #375
Comments
+1 This is incredibly bad. |
On 04.06.2014 17:37, Jeff Bezanson wrote:
I know, that we need better algorithm for multithreading. But after a modification, we have to run a lot of tests on a lot of If you want to contribute to OpenBLAS, please fork the repository, Best regards Werner |
I reopened the feature request regards Werner |
I am aware that manpower is a scarce resource. Unfortunately, I am not capable of writing pull requests to OpenBLAS. However, my point in these timings is that the fix might be very easy. Single threaded execution is faster for all the sizes even when it has more than a million elements and therefore it might be reasonable to turn off multithreading here because of the significant costs for small vectors. |
On 04.06.2014 21:10, Andreas Noack Jensen wrote:
Hi, I will try to optimize the multithreading code, but not I think, that we can solve such problems in the next 2 months. Best regards Werner |
Ref #375: added workaround for small sizes to scal.c and zscal.c Werner |
Please consider the following program that prints relative speed of single and multi threaded OpenBLAS against a simple implementation. I have compiled with
-O2
On a machine with 80 Xeon(R) E7-8850, I get the results below, but I get similar results on my MacBook mid 2009.
The text was updated successfully, but these errors were encountered: