Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openblas_set_num_threads not changing the number of threads #803

Closed
jainanshul opened this issue Mar 15, 2016 · 2 comments
Closed

openblas_set_num_threads not changing the number of threads #803

jainanshul opened this issue Mar 15, 2016 · 2 comments

Comments

@jainanshul
Copy link

I compiled openblas for Android to be used by caffe. I am calling openblas_set_num_threads before loading a caffe model and experimented with various values of num_threads and I don't see any difference in run time. I then printed the values of openblas_get_num_threads and openblas_get_num_procs() and it always returns 8 (max number of cores available on my Android device). I confirmed that I have the fix for #762 by using the develop branch instead of deep_learning.

I compiled openblas with the following configuration:

NO_LAPACK=1 TARGET=ARMV7 USE_THREAD=1 NUM_THREADS=16 USE_OPENMP=1

Another question I have is how many number of threads openblas uses by default if openblas_set_num_threads is not specified and nor is the environment variable OMP_NUM_THREADS.

How can I experiment with setting num of threads with openblas as the above method isn't working for me. Am I not using the right APIs?

@xianyi
Copy link
Collaborator

xianyi commented Mar 15, 2016

@jainanshul , could you try to test gemm instead of caffe?

@brada4
Copy link
Contributor

brada4 commented Mar 19, 2016

If you look in interface/gemm.c

#ifdef SMP
  mode |= (transa << BLAS_TRANSA_SHIFT);
  mode |= (transb << BLAS_TRANSB_SHIFT);

  nthreads_max = num_cpu_avail(3);
  nthreads_avail = nthreads_max;

#ifndef COMPLEX
  MNK = (double) args.m * (double) args.n * (double) args.k;
  if ( MNK <= (65536.0  * (double) GEMM_MULTITHREAD_THRESHOLD)  )
        nthreads_max = 1;
#else
  MNK = (double) args.m * (double) args.n * (double) args.k;
  if ( MNK <= (8192.0  * (double) GEMM_MULTITHREAD_THRESHOLD)  )
        nthreads_max = 1;
#endif
  args.common = NULL;

  if ( nthreads_max > nthreads_avail )
        args.nthreads = nthreads_avail;
  else
        args.nthreads = nthreads_max;

you can set

args.nthreads=num_cpu_avail(3);

right after this code and control number of threads in sgemm using variables.

64k & co stuff is optimized for amd64 CPUs with 128/256k core-exclusive cache

Typical caffe model runs sgemm on say (100-200)x(200-300) matrices, i.e 80-240k which would be suboptimal to drag around between multiple CPU caches on x86, but you may get measurable gain from splitting it along per-cpu caches of ARM SoC (or cache shared by 2 cores as in some SoCs)

Would be interesting to know your /proc/cpuinfo and if you gain anything from 2 and more threads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants