Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix multiplication bug #114

Closed
lendle opened this issue May 23, 2014 · 9 comments
Closed

Matrix multiplication bug #114

lendle opened this issue May 23, 2014 · 9 comments
Labels
bug Something isn't working priority This should be addressed urgently upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@lendle
Copy link

lendle commented May 23, 2014

I'm seeing a bug where when the size of a matrix gets big enough, matrix multiplication is incorrect. The issue occurs on a build of cde17b6a5ede76974b4dd16e66371228b8e23308 on OS X with all deps built from scratch and on Arch linux with some* system installed deps and some deps built from scratch.

Strangely, the issue does not occur when I use the julia package provided by pacman on arch, which is based on 3985890, but it does when I build julia from source based on the same commit. So I think this means the bug must be related to the openblas version that is being built.

*My Make.user on arch:

USE_SYSTEM_LLVM=1
USE_SYSTEM_LIBUNWIND=1
USE_SYSTEM_PCRE=1
USE_SYSTEM_LIBM=0
USE_SYSTEM_FFTW=1
USE_SYSTEM_GMP=1
USE_SYSTEM_MPFR=1
USE_SYSTEM_ARPACK=0
USE_SYSTEM_ZLIB=1

All of the output below is from julia that I built from source, commit 3985890, on arch.

In the weird function below, I'm computing the row sums of an n \times 4 matrix with sum and by multiplying it by a vector of ones, and returning the maximum difference, which should always be zero. When n is large enough, I get a big deviation which tends to increase with n. I think it might be linear in n, but I haven't checked much.

  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2954 (2014-05-08 04:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 3985890* (15 days old master)
|__/                   |  x86_64-unknown-linux-gnu

julia> versioninfo()
Julia Version 0.3.0-prerelease+2954
Commit 3985890* (2014-05-08 04:14 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

julia> function weird(n, p=4)
           W = rand(n, p)
           maximum(abs(sum(W, 2) .- (W * ones(p))))
       end
weird (generic function with 2 methods)

julia> weird(1_000_000)
0.0

julia> weird(10_000_000)
3.92979188171024

I'm pretty sure the issue is with A_mul_B and not sum, because the also_weird function below should be approximately constant in n, but blows up when n gets large enough.

julia> also_weird(n) = var(rand(n, 4) * ones(4))
also_weird (generic function with 1 method)

julia> map(also_weird, [1000, 100_000, 1_000_000, 10_000_000])
4-element Array{Float64,1}:
 0.353453
 0.334094
 0.333347
 1.62178 
@lindahua
Copy link

Yes, the matrix multiplication really has a bug. I have a simpler example to show this:

julia> x = ones(10^7, 4);

julia> y = x * ones(4);

julia> y[1:5]
5-element Array{Float64,1}:
 8.0
 8.0
 8.0
 8.0
 8.0

The correct values of y should be [4.0, 4.0, ....].

Version info:

Julia Version 0.3.0-prerelease+3175
Commit 159eaa6 (2014-05-23 15:30 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.2.0)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

@jiahao
Copy link
Member

jiahao commented May 23, 2014

The threshold for this bug appears to be 4194305:

julia> y=ones(4_194_304, 4) *ones(4)
4194304-element Array{Float64,1}:
 4.0
 4.0
 4.0
   
 4.0

julia> y=ones(4_194_305, 4) *ones(4)
4194305-element Array{Float64,1}:
 8.0
 4.0
 4.0
   
 4.0

@jiahao
Copy link
Member

jiahao commented May 23, 2014

Threshold doesn't change for y=ones(N,1)*ones(1). Also, 4194304==2^22.

@simonster
Copy link
Member

The threshold appears to be 2^21 * OPENBLAS_NUM_THREADS + 1.

@andreasnoack
Copy link
Member

I think this is essentially JuliaLang/julia#5601, which I thought was fixed, but apparently only for some architectures. It must be a joy to maintain a BLAS. I have reopened OpenMathLib/OpenBLAS#340

@ViralBShah
Copy link
Member

Fails for me but with a different result. I am on a mac with core-i5 Haswell. Until openblas fixes this, we should probably avoid calling BLAS and use the generic Julia implementation. This is certainly a serious bug.

julia> x = ones(10^7, 4);

julia> y = x*ones(4);

julia> y[1:5]
5-element Array{Float64,1}:
 12.0
 12.0
 12.0
 12.0
 12.0

@andreasnoack
Copy link
Member

This is exactly JuliaLang/julia#5601, which I of course shouldn't have closed before we changed OpenBLAS version even though the issue has been fixed upstream.

@ViralBShah
Copy link
Member

Can we close this issue then? It is fixed in base now, and will be automatically fixed when we update to 0.2.9.

@andreasnoack
Copy link
Member

Ah. You have changed the gemv calls. Yes, then I think this one can be closed.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority This should be addressed urgently upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

6 participants