Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dscal is slow with multiple threads #375

Closed
andreasnoack opened this issue May 25, 2014 · 6 comments
Closed

dscal is slow with multiple threads #375

andreasnoack opened this issue May 25, 2014 · 6 comments

Comments

@andreasnoack
Copy link
Contributor

Please consider the following program that prints relative speed of single and multi threaded OpenBLAS against a simple implementation. I have compiled with -O2

subroutine myscal(n, a, x)

    implicit none

    integer*8 :: n, i
    double precision :: a, x(n)

    do i = 1, n
        x(i) = x(i)*a
    end do
end

program testscal

    implicit none

    integer*8 :: n, i
    real :: t0, t1, tloop
    double precision :: x(2**20)

    do n = 1, 20

        write(*,*) "Problems size:", 2**n

        x = 1.0d0

        call cpu_time(t0)
        do i = 1, 2**20/n
            call myscal(2_8**n, 0.5d0, x)
        end do
        call cpu_time(t1)
        tloop = t1 - t0
!        write(*,*) "Simple loop:", t1-t0

        call openblas_set_num_threads(2)
        call cpu_time(t0)
        do i = 1, 2**20/n
            call dscal(2_8**n, 0.5d0, x, 1_8)
        end do
        call cpu_time(t1)       
        write(*,*) "OpenBLAS, multi:", (t1-t0)/tloop

        call openblas_set_num_threads(1)
        call cpu_time(t0)
        do i = 1, 2**20/n
            call dscal(2_8**n, 0.5d0, x, 1_8)
        end do
        call cpu_time(t1)       
        write(*,*) "OpenBLAS, single:", (t1-t0)/tloop        
        write(*,*)

    end do

end program

On a machine with 80 Xeon(R) E7-8850, I get the results below, but I get similar results on my MacBook mid 2009.

 Problems size:                    2
 OpenBLAS, multi:   52.6295242    
 OpenBLAS, single:  0.444445431    

 Problems size:                    4
 OpenBLAS, multi:   262.472839    
 OpenBLAS, single:   2.49964237    

 Problems size:                    8
 OpenBLAS, multi:   316.967346    
 OpenBLAS, single:   3.99952316    

 Problems size:                   16
 OpenBLAS, multi:   83.6646500    
 OpenBLAS, single:   1.00000000    

 Problems size:                   32
 OpenBLAS, multi:   41.4989662    
 OpenBLAS, single:  0.500000000    

 Problems size:                   64
 OpenBLAS, multi:   22.9994164    
 OpenBLAS, single:  0.333333343    

 Problems size:                  128
 OpenBLAS, multi:   11.1873531    
 OpenBLAS, single:  0.249992549    

 Problems size:                  256
 OpenBLAS, multi:   6.81489086    
 OpenBLAS, single:  0.222194746    

 Problems size:                  512
 OpenBLAS, multi:   4.23400927    
 OpenBLAS, single:  0.212766171    

 Problems size:                 1024
 OpenBLAS, multi:   2.59999323    
 OpenBLAS, single:  0.242858112    

 Problems size:                 2048
 OpenBLAS, multi:   1.72999442    
 OpenBLAS, single:  0.319998652    

 Problems size:                 4096
 OpenBLAS, multi:   1.13461566    
 OpenBLAS, single:  0.416665643    

 Problems size:                 8192
 OpenBLAS, multi:  0.808165669    
 OpenBLAS, single:  0.363265187    

 Problems size:                16384
 OpenBLAS, multi:  0.614035845    
 OpenBLAS, single:  0.350876421    

 Problems size:                32768
 OpenBLAS, multi:  0.452782184    
 OpenBLAS, single:  0.369986653    

 Problems size:                65536
 OpenBLAS, multi:  0.402858377    
 OpenBLAS, single:  0.430149466    

 Problems size:               131072
 OpenBLAS, multi:  0.525120020    
 OpenBLAS, single:  0.417756647    

 Problems size:               262144
 OpenBLAS, multi:  0.508465350    
 OpenBLAS, single:  0.411614686    

 Problems size:               524288
 OpenBLAS, multi:  0.515307605    
 OpenBLAS, single:  0.421975642    

 Problems size:              1048576
 OpenBLAS, multi:  0.475159228    
 OpenBLAS, single:  0.421504289  
@JeffBezanson
Copy link

+1

This is incredibly bad.

@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

On 04.06.2014 17:37, Jeff Bezanson wrote:

+1

This is incredibly bad.


Reply to this email directly or view it on GitHub:
#375 (comment)
Hi,

I know, that we need better algorithm for multithreading.
Every input is welcome.

But after a modification, we have to run a lot of tests on a lot of
platforms, but we don't have the man-power, to do this for
every feature request.

If you want to contribute to OpenBLAS, please fork the repository,
write and test the new code and then make a pull request.

Best regards

Werner

@wernsaar wernsaar closed this as completed Jun 4, 2014
@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

I reopened the feature request

regards

Werner

@wernsaar wernsaar reopened this Jun 4, 2014
@andreasnoack
Copy link
Contributor Author

I am aware that manpower is a scarce resource. Unfortunately, I am not capable of writing pull requests to OpenBLAS. However, my point in these timings is that the fix might be very easy. Single threaded execution is faster for all the sizes even when it has more than a million elements and therefore it might be reasonable to turn off multithreading here because of the significant costs for small vectors.

@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

On 04.06.2014 21:10, Andreas Noack Jensen wrote:

I am aware that manpower is a scarce resource. Unfortunately, I am not capable of writing pull requests to OpenBLAS. However, my point in these timings is that the fix might be very easy. Single threaded execution is faster for all the sizes even when it has more than a million elements and therefore it might be reasonable to turn off multithreading here because of the significant costs for small vectors.


Reply to this email directly or view it on GitHub:
#375 (comment)

Hi,

I will try to optimize the multithreading code, but not
for the release v0.2.9.

I think, that we can solve such problems in the next 2 months.

Best regards

Werner

@wernsaar
Copy link
Contributor

wernsaar commented Jun 8, 2014

Ref #375: added workaround for small sizes to scal.c and zscal.c
This workaround will be included in the next release after v0.2.9.

Werner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants