You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I compiled openblas for Android to be used by caffe. I am calling openblas_set_num_threads before loading a caffe model and experimented with various values of num_threads and I don't see any difference in run time. I then printed the values of openblas_get_num_threads and openblas_get_num_procs() and it always returns 8 (max number of cores available on my Android device). I confirmed that I have the fix for #762 by using the develop branch instead of deep_learning.
I compiled openblas with the following configuration:
Another question I have is how many number of threads openblas uses by default if openblas_set_num_threads is not specified and nor is the environment variable OMP_NUM_THREADS.
How can I experiment with setting num of threads with openblas as the above method isn't working for me. Am I not using the right APIs?
The text was updated successfully, but these errors were encountered:
right after this code and control number of threads in sgemm using variables.
64k & co stuff is optimized for amd64 CPUs with 128/256k core-exclusive cache
Typical caffe model runs sgemm on say (100-200)x(200-300) matrices, i.e 80-240k which would be suboptimal to drag around between multiple CPU caches on x86, but you may get measurable gain from splitting it along per-cpu caches of ARM SoC (or cache shared by 2 cores as in some SoCs)
Would be interesting to know your /proc/cpuinfo and if you gain anything from 2 and more threads
I compiled openblas for Android to be used by caffe. I am calling openblas_set_num_threads before loading a caffe model and experimented with various values of num_threads and I don't see any difference in run time. I then printed the values of
openblas_get_num_threads
andopenblas_get_num_procs()
and it always returns 8 (max number of cores available on my Android device). I confirmed that I have the fix for #762 by using the develop branch instead of deep_learning.I compiled openblas with the following configuration:
Another question I have is how many number of threads openblas uses by default if
openblas_set_num_threads
is not specified and nor is the environment variableOMP_NUM_THREADS
.How can I experiment with setting num of threads with openblas as the above method isn't working for me. Am I not using the right APIs?
The text was updated successfully, but these errors were encountered: