-
-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zn_poly-0.9.p9 fails at least one its tests on power7 #14098
Comments
comment:1
Note this is the quick test always run with zn_poly. It passes in 5.7beta3 without debug and it fails in beta4 with SAGE_DEBUG=yes. |
comment:2
the Paul |
comment:3
Hi Paul, I suspect that the problem is triggered when enabling the debugging code, furthermore zn_poly itself is built with -DNDEBUG regardless of SAGE_DEBUG=yes. I am wondering if it could cause the problem. Francois |
comment:4
Very odd. The main code is always compiled with -DNDEBUG - no option to turn it of. But the code for the test which fails is all compiled with -DDEBUG - no turning it off either. So it must happening when SAGE_DEBUG is turned on for some other component of sage. Since no one else seem to have seen it before it has to be a power7 specific problem. |
comment:5
To continue on what you started Paul in
n is supposed to be a size_t so I think we have a gross overflow somewhere earlier. The value originates from here:
|
comment:6
On power7 it appears that ZNP_mpn_smp_kara_thresh is equal to SIZE_MAX which according to /usr/include/stdint.h is
random_ulong is defined by
so n needs to be size_t which is at most SIZE_MAX but the test generate a random number between 0 and 3 * SIZE_MAX + 2. I guess it is potentially fine if ZNP_mpn_smp_kara_thresh is not SIZE_MAX, I don't know how it is on other systems. |
comment:7
Francois, can you see how Paul |
comment:8
I am certainly poking at that. The value of ZNP_mpn_smp_kara_thresh is computed by the tuning code and it is clearly allowed to be equal to SIZE_MAX
So someone potentially set themselves for trouble in the test. However after inserting a few printf in the code the mystery deepens
I didn't have the assertion before and after putting these we Abort rather than segfault. |
comment:9
I guess there is a bug in the tuning code, which should not give for Paul |
comment:10
I am the author.... thanks Paul for drawing my attention to this. I haven't looked at this code for years so it's almost as mysterious to me as to everyone else here! My guess is that the bug is in the test code rather than in the tuning code. I suspect that the threshold is allowed to be SIZE_MAX, but that the line
should be replaced by e.g.
It could also be a bug in the tuning code, but that would be much harder to fix. If I remember correctly what this threshold means, it is very surprising to me that its optimal value is SIZE_MAX on any real system. |
comment:11
Thanks for the code. My last error was due to me trying to do something similar and failing to read the original code properly (putting the +2 inside the bracket). |
comment:12
Not sure what happened I wanted to do another run to post tuning.c but the value of ZNP_mpn_smp_kara_thresh is now 133. I swear it was SIZE_MAX before. There is still plenty of SIZE_MAX value in the file:
|
comment:13
Francois, anyway it does not hurt to implement what David suggests in comment [comment:10]. Paul |
comment:14
I can say it worked nicely, so I'll prepare a new spkg with it so this kind of thing cannot happen again. I think I found out what happened and made thing different. In the original build I used gcc-4.7.1, this build the compiler was gcc shipped with the distro gcc-4.3.4. There could be some subtle bugs lurking in gcc itself or the standard used to compile the tuning code.
|
Attachment: mpn_mulmid-test.c.patch.gz patch added to zn_poly for review purposes |
comment:15
OK new spkg ready for review. I also attached the patch for review but it is just David's code. |
This comment has been minimized.
This comment has been minimized.
Author: Francois Bissey, David Harvey |
comment:16
the patch looks fine to me, however since I have no access to a power7 I can only check the patch and new package on another computer. Jeroen, how should we proceed in that case, assuming Francois (the author of the patch and new package) is the only person to have access to a power7? Paul |
comment:17
I don't mind giving positive_review in this case. We can reasonably expect that the author has tested the package on the failing machine. |
Reviewer: Paul Zimmermann |
comment:18
however I'd like to check first the new spkg works on my machine. Paul |
comment:19
This doesn't even build:
|
comment:21
Sorry made a big mistake when preparing the final spkg (source not pristine clean, that's rather unforgiving). It should be ok now (I double checked). |
comment:22
all tests now pass on my computer (on top of Sage 5.6). Paul |
Changed reviewer from Paul Zimmermann to Paul Zimmermann, Jeroen Demeyer |
Merged: sage-5.8.beta1 |
comment:24
zn_poly's tuning (and apparently due to that its test suite, too) is flaky on other systems as well: #13947 It would be nice if some of you could also take a look at that... :P |
comment:25
Yes it looks similar. Turning debugging on was somewhat helpful here. |
Changed author from Francois Bissey, David Harvey to François Bissey, David Harvey |
On the login node of our power7 cluster (beatrice) zn_poly fails make check
Here is a detailed backtrace
It seems to point the finger at mpir.
New spkg:
CC: @jdemeyer
Component: porting
Author: François Bissey, David Harvey
Reviewer: Paul Zimmermann, Jeroen Demeyer
Merged: sage-5.8.beta1
Issue created by migration from https://trac.sagemath.org/ticket/14098
The text was updated successfully, but these errors were encountered: