-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenBLAS build fails due to segmentation fault on Power8 platform #2166
Comments
This issue is observed in the develop branch. But, the issue seems to be fixed in the master branch. The segmentation fault is due to stack corruption caused in kernel/power/dtrmm_kernel_16x4_power8.S. Closing this issue as it is fixed in master branch |
master branch is actually stale ( |
Thanks for the information. The issue is not specific to endianness but the application mode. When compiled in 64-bit mode, this bug will be definitely corrupting the stack even on little endian systems. So, will reopen this. |
Then I assume this must have been caused by (or at least related to) my PR #1317 almost two years ago. I do wonder why it would go unnoticed for so long as I would think 64bit is what is typically used on that platform ? @quickwritereader |
Thanks for the fix. So it was essentially a stray line, somehow duplicated from the code I was editing. |
Fix DTRMMKERNEL register save for power8 64-bit mode (Fix for #2166)
as far as I know @martin-frbg your fixes worked and I did not have a problem with little endian. Still, fixes are welcomed. for power9 I was almost re-writing entry but just for le |
OpenBLAS build fails for Power8 target in 64-bit mode(Both Redhat Big endian and AIX ). Details are as below:
#make DEBUG=1 BINARY=64 TARGET=POWER8
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x3FFF95A692C3
#1 0x3FFF95A69E23
#2 0x3FFF95CB0477
#3 0x10078920 in dtrmm_ounucopy at trmm_uncopy_4.c:93
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat2 < ./cblat2.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
/bin/sh: line 1: 54413 Segmentation fault (core dumped) OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
make[1]: *** [level3] Error 139
make[1]: *** Waiting for unfinished jobs....
rm -f ?BLAT2.SUMM
OMP_NUM_THREADS=2 ./sblat2 < ./sblat2.dat
OMP_NUM_THREADS=2 ./dblat2 < ./dblat2.dat
OMP_NUM_THREADS=2 ./cblat2 < ./cblat2.dat
OMP_NUM_THREADS=2 ./zblat2 < ./zblat2.dat
make[1]: Leaving directory `/home/kavana/OpenBLAS/test'
make: *** [tests] Error 2
[root@pokndd8 OpenBLAS]# cd test
[root@pokndd8 test]# gdb ./dblat3
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/kavana/OpenBLAS/test/dblat3...done.
(gdb) run < ./dblat3.dat
Starting program: /home/kavana/OpenBLAS/test/./dblat3 < ./dblat3.dat
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x3ffeb391f140 (LWP 54472)]
[New Thread 0x3ffeb311f140 (LWP 54473)]
.......
Program received signal SIGSEGV, Segmentation fault.
, argc=, ubp_av=0x3ffffffff4f8,dtrmm_ounucopy (m=4294967297, n=12, a=0x3fffffff5ff0, lda=2, posX=0, posY=1, b=) at generic/trmm_uncopy_4.c:93
93 b[ 0] = data01;
(gdb) where
#0 dtrmm_ounucopy (m=4294967297, n=12, a=0x3fffffff5ff0, lda=2, posX=0, posY=1, b=) at generic/trmm_uncopy_4.c:93
#1 0x0000000010014f14 in dtrmm_RNUU (args=, range_m=, range_n=, sa=0x3ffeb3920000,
sb=0x3ffeb3cc0000, dummy=) at trmm_R.c:254
#2 0x000000001000fa64 in dtrmm_ (SIDE=, UPLO=, TRANS=, DIAG=,
M=, N=, alpha=, a=, ldA=0x3ffffffb3cd8, b=0x3ffffffe57e0, ldB=0x3ffffffb3cd4)
at trsm.c:381
#3 0x0000000010005bd8 in dchk3 (sname='DTRMM ', eps=2.2204460492503131e-16, thresh=16, nout=6, ntra=-1, trace=.FALSE., rewi=.FALSE.,
fatal=.FALSE., nidim=6, idim=..., nalf=3, alf=..., nmax=65, a=..., aa=..., as=..., b=..., bb=..., bs=..., ct=..., g=..., c=..., _sname=6)
at dblat3.f:1059
#4 0x000000001000e9a4 in dblat3 () at dblat3.f:292
#5 0x000000001000180c in main (argc=, argv=) at dblat3.f:355
#6 0x00003fffb7a06bec in generic_start_main (main=@0x100a0190: 0x100017e0
auxvec=0x3ffffffff5e8, init=, rtld_fini=, stack_end=, fini=)
at ../csu/libc-start.c:274
#7 0x00003fffb7a06e14 in __libc_start_main (argc=, ubp_av=, ubp_ev=, auxvec=,
rtld_fini=, stinfo=, stack_on_entry=) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:91
#8 0x0000000000000000 in ?? ()
(gdb)
The text was updated successfully, but these errors were encountered: