Matrix multiplication bug #114

lendle · 2014-05-23T19:46:04Z

I'm seeing a bug where when the size of a matrix gets big enough, matrix multiplication is incorrect. The issue occurs on a build of cde17b6a5ede76974b4dd16e66371228b8e23308 on OS X with all deps built from scratch and on Arch linux with some* system installed deps and some deps built from scratch.

Strangely, the issue does not occur when I use the julia package provided by pacman on arch, which is based on 3985890, but it does when I build julia from source based on the same commit. So I think this means the bug must be related to the openblas version that is being built.

*My Make.user on arch:

USE_SYSTEM_LLVM=1
USE_SYSTEM_LIBUNWIND=1
USE_SYSTEM_PCRE=1
USE_SYSTEM_LIBM=0
USE_SYSTEM_FFTW=1
USE_SYSTEM_GMP=1
USE_SYSTEM_MPFR=1
USE_SYSTEM_ARPACK=0
USE_SYSTEM_ZLIB=1

All of the output below is from julia that I built from source, commit 3985890, on arch.

In the weird function below, I'm computing the row sums of an n \times 4 matrix with sum and by multiplying it by a vector of ones, and returning the maximum difference, which should always be zero. When n is large enough, I get a big deviation which tends to increase with n. I think it might be linear in n, but I haven't checked much.

  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2954 (2014-05-08 04:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 3985890* (15 days old master)
|__/                   |  x86_64-unknown-linux-gnu

julia> versioninfo()
Julia Version 0.3.0-prerelease+2954
Commit 3985890* (2014-05-08 04:14 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

julia> function weird(n, p=4)
           W = rand(n, p)
           maximum(abs(sum(W, 2) .- (W * ones(p))))
       end
weird (generic function with 2 methods)

julia> weird(1_000_000)
0.0

julia> weird(10_000_000)
3.92979188171024

I'm pretty sure the issue is with A_mul_B and not sum, because the also_weird function below should be approximately constant in n, but blows up when n gets large enough.

julia> also_weird(n) = var(rand(n, 4) * ones(4))
also_weird (generic function with 1 method)

julia> map(also_weird, [1000, 100_000, 1_000_000, 10_000_000])
4-element Array{Float64,1}:
 0.353453
 0.334094
 0.333347
 1.62178

The text was updated successfully, but these errors were encountered:

lindahua · 2014-05-23T20:00:19Z

Yes, the matrix multiplication really has a bug. I have a simpler example to show this:

julia> x = ones(10^7, 4);

julia> y = x * ones(4);

julia> y[1:5]
5-element Array{Float64,1}:
 8.0
 8.0
 8.0
 8.0
 8.0

The correct values of y should be [4.0, 4.0, ....].

Version info:

Julia Version 0.3.0-prerelease+3175
Commit 159eaa6 (2014-05-23 15:30 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.2.0)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

jiahao · 2014-05-23T20:15:12Z

The threshold for this bug appears to be 4194305:

julia> y=ones(4_194_304, 4) *ones(4)
4194304-element Array{Float64,1}:
 4.0
 4.0
 4.0
 ⋮  
 4.0

julia> y=ones(4_194_305, 4) *ones(4)
4194305-element Array{Float64,1}:
 8.0
 4.0
 4.0
 ⋮  
 4.0

jiahao · 2014-05-23T20:19:54Z

Threshold doesn't change for y=ones(N,1)*ones(1). Also, 4194304==2^22.

simonster · 2014-05-23T20:29:15Z

The threshold appears to be 2^21 * OPENBLAS_NUM_THREADS + 1.

andreasnoack · 2014-05-24T06:41:57Z

I think this is essentially JuliaLang/julia#5601, which I thought was fixed, but apparently only for some architectures. It must be a joy to maintain a BLAS. I have reopened OpenMathLib/OpenBLAS#340

ViralBShah · 2014-05-25T07:49:40Z

Fails for me but with a different result. I am on a mac with core-i5 Haswell. Until openblas fixes this, we should probably avoid calling BLAS and use the generic Julia implementation. This is certainly a serious bug.

julia> x = ones(10^7, 4);

julia> y = x*ones(4);

julia> y[1:5]
5-element Array{Float64,1}:
 12.0
 12.0
 12.0
 12.0
 12.0

andreasnoack · 2014-05-25T17:56:47Z

This is exactly JuliaLang/julia#5601, which I of course shouldn't have closed before we changed OpenBLAS version even though the issue has been fixed upstream.

ViralBShah · 2014-05-25T18:03:51Z

Can we close this issue then? It is fixed in base now, and will be automatically fixed when we update to 0.2.9.

andreasnoack · 2014-05-25T18:12:38Z

Ah. You have changed the gemv calls. Yes, then I think this one can be closed.

lindahua mentioned this issue May 26, 2014

Update to OpenBLAS 0.2.9 #116

Closed

ViralBShah mentioned this issue Jun 11, 2014

WIP: Bump OpenBLAS to 0.2.9 and LAPACK to 3.5.0 JuliaLang/julia#7213

Merged

KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix multiplication bug #114

Matrix multiplication bug #114

lendle commented May 23, 2014

lindahua commented May 23, 2014

jiahao commented May 23, 2014

jiahao commented May 23, 2014

simonster commented May 23, 2014

andreasnoack commented May 24, 2014

ViralBShah commented May 25, 2014

andreasnoack commented May 25, 2014

ViralBShah commented May 25, 2014

andreasnoack commented May 25, 2014

Matrix multiplication bug #114

Matrix multiplication bug #114

Comments

lendle commented May 23, 2014

lindahua commented May 23, 2014

jiahao commented May 23, 2014

jiahao commented May 23, 2014

simonster commented May 23, 2014

andreasnoack commented May 24, 2014

ViralBShah commented May 25, 2014

andreasnoack commented May 25, 2014

ViralBShah commented May 25, 2014

andreasnoack commented May 25, 2014