-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DTRMM doesn't give correct results #951
Comments
Which version comes with Ubuntu 16.04 then - 0.2.18 from what I found on the 'net ? As far as I can see, the only non-trivial change in trmm_R.c happened just before the 0.2.18 release but should have affected "only" loop unrolling for the GEMM function(s). (Downgrading to 0.2.17 might tell) |
The version I built by myself used the latest version from git. If I build with the standard makefile and TARGET=SANDYBRIDGE everything works, because apparently the reference implementation is used? At least that is what my gdb trace looked like. The CMake build has never worked since it was introduced. I did a git bisect to confirm. So I think in the latest Ubuntu, they also used CMake to build OpenBLAS, hence the error. |
Not sure if that means that the cmake setup is to blame, or does building with plain "make" without specifying a target (which I guess would end up with picking HASWELL) get you a working version as well ? |
Just using the makefile and no arguments also works (and gives the correct results). I used SANDYBRIDGE earlier because while debugging I found that earlier versions require the target option. So it's not that. |
That would (probably) mean that you get a different config.h and Makefile.conf when you use cmake ? (If so, the implications might go beyond just the Skylake architecture) |
Could be something like the changes to the default x86_64 KERNEL file choices for gemv functions from acdff55 would need propagating to cmake/kernel.cmake (but I am not sure I even understand |
Can you count residuals with all zeroes and all ones (i.e 1+1 cuts sight better than rand()+rand() ) |
I'm quite busy for at least 10 more days. After that I will further investigate this bug. |
It is hard to guess what is behind 0x7fff ... pointers (you can eXamine the address to get value) |
It's cmake, so I don't think these files are generated since it does out-of-source builds.
Edit: Sorry this last part was with threading enabled. However, with 1 thread the results are the same, except that nthreads=1 and the arguments argument is named args instead of arg. |
Note that the tests also fail for me:
|
Can you produce a call tree of OpenBLAS functions up to (or used by) the failing function ? If the problem occurs only with cmake builds and cmake does not use any of the config.h mechanism of the pure "make" build system, I can only guess that some difference in default function selection through the defines in kernel.cmake must be the culprit. (Though all the obvious candidates from the relatively recent changes to x86_64/KERNEL appear to be overriden by optimized functions in KERNEL.HASWELL). |
valgrind doesn't give anything. Checking what functions are called gives something interesting. The working version calls this:
and the non-working version calls this
Note that they are different. For instance when the working version calls itcopy8, the version that doesn't work calls otcopy8. |
Never mind that last comment. That difference was because I wasn't using haswell in the working version. Now they are both the same. I will look a bit further... |
So the actual difference between the working and non-working call is this: Extra in working:
Extra in non-working:
At least that is what I got when doing a diff between two calls with gdb and |
Not sure what to make of the supposedly extra calls to get_num_threads etc - perhaps you want to check with ldd that both binaries link to the same system libraries (to exclude that one was built with OPENMP support and the other without or similar). |
They link to the same libraries. Maybe it actually builds the haswell implementations (because they are there in my cmake build directory) but does not actually use them? What I did for the diff was first sort by file name, then by function name, and then take out all the addresses, breakpoint numbers and line numbers. So the lines in dtrmm_kernel_4x8_haswell.c are simply not there in the non-working case. It could also be that the cmake build and standard build forget to attach debug symbols for some files when built with debugging enabled. |
I do not see dtrmm_kernel_4x8_haswell.c (or DTRMM as such) defined anywhere except in KERNEL.HASWELL. What i did hit upon was a USE_TRMM variable in kernel/CMakeLists.txt |
Oh right, fixed it. Thanks a lot! Anyhow, doesn't this still mean it's broken with the standard implementation? Or was it just broken because it expected this implementation and not the standard one? |
That's one for the boss to answer - I am just a self-appointed janitor around here :-) |
Since it doesn't work with the precompiled version in Ubuntu 16.04 as well, and that version does not use cmake, there's still another bug out there. I'll try to fix it tomorrow. It should be easier to find now that I know that I probably need a different build target to be able to bisect it. |
What's the precompiled version in Ubuntu 16 based on then ? If it works in your copy of git develop, chances would seem to be that it is something that got fixed since whatever release they picked ? |
@martin-frbg Thank you for the fix. I am not CMake expert. Thus, the codes are not elegant. |
Some debugging so far: In the debian changelog, it say that
Which happened somewhere after 14.04 (Ubuntu version where it worked). This compilation failure is caused by a85c278 as far as I can tell. The reason is that the TARGET is GENERIC, but it also tries to build other targets which don't have a trmm kernel. Anyhow, trying to reproduce whatever they did when building the Ubuntu version did not give me any wrong answers yet. Maybe I need to build it on a different system... |
Not sure what you are comparing here - wouldn't the OpenBLAS libraries in Ubuntu 14&16 be based on entirely different releases (or even snapshots of the develop branch) ? Or are you comparing the |
They are different releases but they are built in a different way as well. For me if I build things in the standard way everything works, so I'm trying to find out what the ubuntu maintainer does different from me when building the packages. For the 14.04 version, this was apparently at least that he was building with TARGET=GENERIC. But I'm trying to find more differences. Anyhow, from this compilation failure, I found that in Makefile.L3, TARGET is never changed during a DYNAMIC_ARCH build even though it says TARGET = $(TARGET_CORE) at the top of the file. So now I'm wondering if that might be a problem. For now I just changed it to CORE since that always contains the right thing. I'm also trying to build without USE_TRMM to see if that causes trouble. Edit: Bingo! Not using USE_TRMM gives me this:
|
Hmm. I am fairly certain that I saw the DYNAMIC_ARCH build looping through ...PRESCOTT, ...HASWELL etc. variants in the Ubuntu 16.04 build log on their server. Possibly the TARGET as such is |
DYNAMIC_ARCH loops just fine, but in the makefile itself where checks are performed I found that CORE LIBCORE and TARGET_CORE were set correctly (by Makefile_kernel.conf I believe) but TARGET was not. Maybe I did something wrong there. But since CORE is used in Makefile.L3 as well, maybe it's safer to use CORE (like in some other if statements). Yesterday, we found that USE_TRMM was not set correctly for HASWELL. By changing the cmake file I made sure the TRMM kernel was used. So the problem seems to occur whenever this is not used. I now reproduced that with a normal makefile by unsetting it. In ubuntu 14.04 (v0.2.8), the USE_TRMM related if statements are not there, so at that point apparently it worked. So I'm bisecting at the moment to find where it went wrong (while removing all USE_TRMM=1 statements from the makefiles). I expect it to be done in a few hours. Edit: Oh yeah, and apparently there's also something wrong with the identification of Haswell by the package that is provided by Ubuntu. Or else I wouldn't have encountered this problem. Maybe that's another bug? There's really too much going on here... |
So it looks as if USE_TRMM=1 is not set correctly during the Haswell part of a DYNAMIC_ARCH build ? |
The first bad commit is 9bd962f but only in the case that USE_TRMM is undefined. So then I guess it uses the kernel/x86_64/dgemm_kernel_4x8_haswell.S file that was added there? Does that mean that there is a bug in that file? Or is this just wrong in general? For the Ubuntu build: The logs contain this line
So it seems to build the right file. |
As I see it, USE_TRMM must be set for Haswell so that the correct function is compiled. The bug would have to be either that USE_TRMM is not set automatically in that part of the DYNAMIC_ARCH build process, or (less likely I guess) that the library still manages to call a/the wrong implementation at runtime. If you have the log file from a TARGET=HASWELL build still available, you could check if |
can you please post an example reproducing the issue so we can stop guessing what the issue might be. That there is a working code path with USE_TRMM or haswell is irrelevant, all code path must work. |
Since Haswell has its own dtrmm implementation, would it be an idea to just remove all #if defined(TRMMKERNEL) stuff from the dgemm kernel? This seems to be the only thing that is bugged. After this we still have to make sure that the right code path is used in Ubuntu, but I think that's a different issue. |
Most likely not - even if the file is called dgemm_something_haswell it may be used for sufficiently similar architectures as well. What appears to be broken is the behaviour on Haswell with USE_TRMM=false - which could be a bug or a feature depending on the intentions of the authors, |
Yes, exactly. |
And it seems the Ubuntu maintainer for the OpenBLAS package has joined this thread. :-) |
1st as a workaround: 2nd nm -D libopenblas.so should yield same symbol list and their sizes should be identical (but list is not necessarily in same order) |
@brada4 the cmake problem is fixed already (though the fix was applied on the master branch, not develop, as that appears to be what Sbte's PR was based on). |
I can't reproduce it on a haswell, nor any of the other x86 core types openblas supports (using the ubuntu package and master). So its maybe skylake dependent, though that does not really make sense as skylake just uses haswell code... |
@juliantaylor Huh, it also seems to work on my system as well now. Must have been something else that caused this then. But this bug certainly existed before or else I would have never encountered it. Note that I never install anything globally on my system manually, so something in the Ubuntu repositories caused this to happen, but apparently not the openblas package. This just keeps getting weirder... |
Actually that is not possible because the wrong implementation was never built (as far as I can tell from the build logs and the Makefile logic). So there is no way I could have been using it. Maybe I ran into another bug then that looks very similar to this one, but then found this one while trying to debug it (because I started building my own openblas using cmake). That's the only thing I can think of. |
Weird. How about closing this issue (as you found and fixed the bug in CMakeLists.txt that this one was originally about) and opening a new one if and when the DYNAMIC_ARCH gremlin reappears ? |
Well, the implementation in kernel/x86_64/dgemm_kernel_4x8_haswell.S is still broken when compiled with -DTRMMKERNEL as far as I can tell. So this bug still exists in that file. I can open a new issue about that so at least it is documented that it is broken? I guess there's no high priority in fixing it since it is currently not used. |
But is there any combination of build options that actually leads to this getting built with -DTRMMKERNEL (and with wrong results, assuming this might be used by some non-Haswell KERNEL file as well) ? The whole file including the "ifdef TRMMKERNEL" branches appears to have come from a single commit by wernsaar in mid-2015, maybe it is a leftover from experiments that he did not think useful enough to upload, or in contrary a fallback path that made sense back then only. |
You can break about any kernel by pre-defining internal defines like TRMMKERNEL or CFLAGS. |
What I meant to say is that Makefile.L3 is able to do that if USE_TRMM is undefined. But for me this can be closed, if there is no need to fix the dgemm kernel. Probably if someone ever wants to actually use the TRMM part it will be well tested anyway. |
Only you as the submitter, or xianyi as the project owner can close this anyway. |
DTRMM doesn't give correct results for me. I built with CMake on intel skylake, and managed to chase the calls down to
I tested it on a random 10x10 matrix. I called it like this from a C program
The residual when comparing to the correct result looks like.
Edit: Since the thread is quite long, I put a short summary here.
This seems to be caused by the kernel/x86_64/dgemm_kernel_4x8_haswell.S file that was committed in 9bd962f . This file is only used for the TRMM implementation when on Haswell USE_TRMM is not set (see Makefile.L3). In that same commit, USE_TRMM was set to 1 on Haswell in Makefile.L3, but the same was not done for the cmake build. This was fixed in bcfc298 . Now what is left is that the TRMM implementation in kernel/x86_64/dgemm_kernel_4x8_haswell.S is still broken. This implementation is currently not used if everything is built correctly.
The text was updated successfully, but these errors were encountered: