Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result in gemv for large n #340

Closed
andreasnoack opened this issue Jan 29, 2014 · 23 comments
Closed

Wrong result in gemv for large n #340

andreasnoack opened this issue Jan 29, 2014 · 23 comments
Assignees

Comments

@andreasnoack
Copy link
Contributor

When n>2^23 gemv gives wrong results. The following program is an example

program test

    implicit none

    double precision :: a(2**23+1,4), b(4), c(2**23+1)
    integer*8 :: n, idamax, i

    a = 1.0d0
    b = 1.0d0

    call dgemv('N',2**23+1_8,4_8,1.0d0,a,2**23+1_8,b,1_8,1.0d0,c,1_8)
    n = idamax(2**23+1_8,c,1_8)

    write(*,*) c(n)

end

OpenBLAS is compiled with 64 bit integer support. My computer is a mid 2009 MacBook Pro but the problem is also present on a Dell Desktop with core i5 running Ubuntu 14.04.

@JeffBezanson
Copy link

This is a really serious bug. Would love to see this fixed in the next release.

@wlbksy
Copy link
Contributor

wlbksy commented Jan 31, 2014

It's Chinese lunar new year now. I guess it won't be fixed quite soon.

@jiahao
Copy link

jiahao commented Jan 31, 2014

恭喜发财 to you too :)

@martin-frbg
Copy link
Collaborator

Whatever it is, it seems to have happened between 0.2.6 and 0.2.7 - the former says "4" like netlib
while the latter (and all after it) say "20" at least on my Nehalem-based laptop. I could play again with my newly acquired "git bisect" knowledge but I will probably not be able to fix what I find :-/

@martin-frbg
Copy link
Collaborator

23965f1 is the first bad commit
commit 23965f1
Author: wangqian [email protected]
Date: Wed May 29 19:48:31 2013 +0800

Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.

@xianyi
Copy link
Collaborator

xianyi commented Feb 3, 2014

Sorry for the delay. I am on Chinese New Year holiday.

@wangqian , please fix this bug. :)

@jiahao, 谢谢,马年大吉大利

@wangqian
Copy link
Contributor

wangqian commented Feb 4, 2014

Sorry for delay, I have fixed this bug. Please test it.

@andreasnoack
Copy link
Contributor Author

@wangqian Just tried and now I get a segfault. The program above gives me

andreass-mbp:Desktop andreasnoackjensen$ ./a.out 

Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:
#0  0x11a899f72
#1  0x11a89a73e
#2  0x7fff8a4fb5a9
Bus error: 10

@martin-frbg
Copy link
Collaborator

Hmm. New version works for me (i7 Nehalem, openSuSE, openBLAS built with USE_THREAD=0,USE_OPENMP=1 for other reasons)

@andreasnoack
Copy link
Contributor Author

@xianyi I can confirm your commit has fixed the problem on my Mac.

@andreasnoack
Copy link
Contributor Author

Well, not completely. The original problem is still there for cgemv.

@JeffBezanson
Copy link

bump

@xianyi
Copy link
Collaborator

xianyi commented May 6, 2014

@JeffBezanson ,
I will debug this error this week.

@xianyi
Copy link
Collaborator

xianyi commented May 14, 2014

@andreasnoackjensen
I cannot reproduce cgemv bug. Could you provide your test code?

Thank you

@andreasnoack
Copy link
Contributor Author

@xianyi I have just tried again and I cannot reproduce it either, so I guess we can close this one. Thanks.

@andreasnoack
Copy link
Contributor Author

@xianyi This is still a problem, but not on all architectures. On my Intel Core 2 Duo the problem appears to be solved, but the problem is there for at least i5-4670, i7-3770 and Xeon(R) E7-8850. As far as I can see, the problem is only for double precision.

@ViralBShah
Copy link
Contributor

I see the issue on my Intel core i5 macbook pro. I believe it is Haswell.

@wernsaar
Copy link
Contributor

On 25.05.2014 10:09, Viral B. Shah wrote:

I see the issue on my Intel core i5 macbook pro. I believe it is Haswell.


Reply to this email directly or view it on GitHub:
#340 (comment)
Hi,

I cannot reproduce this error.
Please give me some information:

  • which source code do you use
  • the Makefile.rule
  • the command line to compile openblas
  • the source of your test proram
  • the expected result of your test program
  • the command line to compile the test program

Best regards

Werner

@ViralBShah
Copy link
Contributor

I am using version 0.2.8. The Makefile.rule is what ships.

I am doing these tests from Julia, so I do not have a standalone program. The flags are:

USE_THREAD=1 NUM_THREADS=8 NO_AFFINITY=1 DYNAMIC_ARCH=1 INTERFACE64=1 BINARY=64

@ViralBShah
Copy link
Contributor

Please see the Julia issue above referenced by @andreasnoackjensen

@ViralBShah
Copy link
Contributor

The test program provided above also provides the wrong answer on v0.2.8.

@andreasnoack
Copy link
Contributor Author

@wernsaar Sorry for the noise. This has been solved. I thought I had recompiled with the develop branch, but I hadn't.

@ViralBShah
Copy link
Contributor

Ok - that explains it. Thanks. I thought this was fixed for v0.2.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants