-
-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zn_poly segfaults during tuning and tests on OS X and Cygwin when built on a busy system #13947
Comments
comment:1
Does it really segfault, and especially does the tuning segfault? I thought zn_poly would just occasionally generate "unexpectedTM" values during tuning on MacOS X and Cygwin (presumably only under heavy system load), such that afterwards some tests (with zn_poly rebuilt, or more precisely, relinked with these paramaters) would deterministically fail. (This might still depend on the compiler as well, at least the way it fails.) |
comment:2
P.S.: I was actually going to create a zn_poly spkg which simply saves the tuning parameters in case the tests fail, asking the user for submitting them to sage-devel or sage-release... (and probably doing a few more attempts to get working tuning parameters, and/or inform the user that he/she should reinstall the spkg when the sysload is lower). :-) |
comment:3
P.P.S.: John, is it always (just) |
comment:4
I also reproduced it on bsd.math. |
comment:5
Also reproduced on hawk (OpenSolaris i386). |
comment:6
Replying to @jdemeyer:
How? I tried hard yesterday, but didn't manage. Even with John's tuning parameters that made the test(s) fail for him, still all tests (quick as well as extensive) pass for me on bsd.math (with Sage 5.6.beta3 [with GCC 4.6.3 built], and the included MPIR 2.4.0, FWIW). I was actually hoping we could reproduce the test failures on e.g. Linux as well with such "failing" parameters, although probably depending on the GCC version, too. |
comment:7
Replying to @nexttime:
Hmmm, I probably forgot to run But still, I cannot reproduce the failure with John's parameters. |
comment:8
Replying to @nexttime:
Ooops, not true. In the last attempt, I missed that
Presumably because those tests take a pretty long time... ;-) |
comment:9
The (extensive) tests that didn't get run because
(This is with John's "failing" tuning parameters, bsd.math.) |
comment:10
Replying to @nexttime:
That's my recollection. In my experiment yesterday, I only ran that test (using |
comment:11
Replying to @jhpalmieri:
With your tuning parameters, I also only get the extensive test of |
comment:12
IIRC (and I'm quite sure I am) I got segfault as well during the tuning itself on Cygwin (64bits Windows 7), mostly when issuing make with MAKE="make -j4" so the system must have been busy as well, but IIRC (less sure) it also happened when building zn_poly alone. The segfaults happened while tuning KS/FFT things, mostly the last one which is mulmid, but I seem to tremember it also happened during the previous KS/FFT things sometimes. Of course I tried to reproduce that this morning and could not (I let ATLAS build in parallel to keep the system busy but that did not seem to do the trick). I'll give it another shot in the next couple of days. |
comment:13
Replying to @nexttime:
Could you give it a shot by only testing the MPIR part and disabling the comparison in the test code? |
comment:14
Replying to @jpflori:
??? It's just the comparison that fails (or, more precisely, the tests make the "success" depend on the comparison only); no segfaults, no failed assertions. I removed the "exit on first failure" and got 5 failures (from the "extensive"
Well, since the failure depends on zn_poly's thresholds (for zn_poly's functions), it's IMHO clearly in zn_poly, not MPIR. (Unless zn_poly was right only with the "failing" tuning parameters, and incidentally MPIR [2.4.0 and 2.6.0] and zn_poly would give the same wrong results otherwise. Or am I missing something?) There are still random numbers involved though, so the tests may pass or fail under different circumstances. |
comment:15
The offending parameter in John's (There are others, but those aren't relevant for the tests, apparently.) |
comment:16
Replying to @nexttime:
When I set all More interestingly, the failures only happen when squaring. If I use separate "buffers" for both operands (in |
comment:17
Replying to @nexttime:
I also get the quick test to fail with all (2...64 bits) And I meanwhile managed to get "invalid" tuning parameters on Linux x86_64, too (although just once, but unintentionally). I don't think the bug (or test failure) is in any way related to the compiler / GCC version or compilation options, as I've so far been able to force it with every GCC version I tried (4.4.3, 4.6.3, 4.7.0, 4.7.2), regardless of whether I used e.g. Still don't know whether (just) [As mentioned, all failures vanish when |
comment:19
Ok, leif, can you put your recipe to trigger the failure in the summary? |
comment:20
Replying to @kiwifb:
Oh, I don't recall right now (searching logs ...), but I think I just faked the values in After running Then you can play with it, i.e., modify the code (or tuning values), and run (More to come if I find the logs, otherwise also see the comments above for more info.) |
comment:21
Replying to @nexttime:
Hmmm, sorry, cannot find any. I vaguely remember I had a power outage before I saved anything... 8-/ |
comment:22
Replying to @nexttime:
Yep: --- zn_poly-0.9.p5/src/test/test.c.orig 2008-09-19 17:37:47.000000000 +0200
+++ zn_poly-0.9.p5/src/test/test.c 2013-01-13 20:33:11.919633442 +0100
@@ -209,6 +209,11 @@
int all_success = 1, any_targets = 0, quick = 0, success, i, j;
+#if 1 || defined(FAKE_THRESHOLDS)
+ for(i=2;i<=64;i++)
+ tuning_info[i].mul_fft_thresh=1; // always (I think)
+#endif
+
for (j = 1; j < argc; j++)
{
if (!strcmp (argv[j], "-quick")) I've also found --- zn_poly-0.9.p5/src/test/nuss-test.c.orig 2008-09-19 17:37:47.000000000 +0200
+++ zn_poly-0.9.p5/src/test/nuss-test.c 2013-01-13 20:25:40.629633300 +0100
@@ -59,6 +59,16 @@
ref_zn_array_scalar_mul (res, res, n, x, mod);
int success = !zn_array_cmp (ref, res, n);
+#if 1 || defined(TEST_VERBOSE)
+ if(!success)
+ {
+ fprintf(stderr,
+ "testcase_nuss_mul(): comparison FAILED: lgL=%u (n=%lu) sqr=%d mod.m=%lu mod.bits=%d\n",
+ lgL, n, sqr,
+ mod->m, mod->bits);
+ }
+#endif
+
pmfvec_clear (vec2);
pmfvec_clear (vec1);
@@ -67,7 +77,7 @@
if (!sqr)
free (buf2);
free (buf1);
-
+
return success;
}
@@ -84,6 +94,7 @@
zn_mod_t mod;
for (i = 0; i < num_test_bitsizes; i++)
+#if 0
for (lgL = 2; lgL <= (quick ? 11 : 13) && success; lgL++)
for (trial = 0; trial < (quick ? 1 : 5) && success; trial++)
{
@@ -92,6 +103,16 @@
success = success && testcase_nuss_mul (lgL, 1, mod);
zn_mod_clear (mod);
}
+#else /* don't stop upon first failure: */
+ for (lgL = 2; lgL <= (quick ? 11 : 13) /* && success */; lgL++)
+ for (trial = 0; trial < (quick ? 1 : 5) /* && success */; trial++)
+ {
+ zn_mod_init (mod, random_modulus (test_bitsizes[i], 1));
+ success &= testcase_nuss_mul (lgL, 0, mod);
+ success &= testcase_nuss_mul (lgL, 1, mod);
+ zn_mod_clear (mod);
+ }
+#endif
return success;
} to not stop at the first test failure in |
attached by request (my dual quad-core 10.7.5 mac is having this problem) |
comment:23
Attachment: tuning.c.gz Replying to @jpflori:
Just as a data point, I can confirm this, even with |
comment:24
Replying to @kcrisman:
Confirm what exactly? Tuning fails if the box is too busy? And if so, how? Building itself (before and/or after tuning) can also fail? Or does just the quick test after "successfully" building zn_poly fail (due to failing comparisons, as intended, or with a segfault or whatever)? |
comment:25
Correct; with one other spkg being built it was too much. Segfault during tuning in KS/FFT mul, repeatable. No problems during short self-test, though of course I couldn't try that without using just one thread in any case.
I guess not. |
comment:40
(I could leave it as is and just update |
comment:41
I would just include the patch and see if people still report problems. |
Diff between the |
comment:42
Attachment: zn_poly-0.9.p10-p11.diff.gz |
This comment has been minimized.
This comment has been minimized.
comment:43
Replying to @jdemeyer:
Ok, did so, see attached diff. The |
comment:44
Somebody should take a look at tuning on MacOS X and Cygwin though, as the failures were apparently triggered by "random" tuning parameters... (which the patch obviously doesn't affect). |
comment:45
Agreed. I may be able to do this on OS X today, but not Cygwin until later. |
comment:46
I can't reproduce it on my Mac box, but I don't think I ever did. Maybe John can try it on bsd again... |
comment:47
I agree that tuning on heavily loaded OS X systems has not been addressed. I also don't know about any segfaults. Anyway, I tried building the old and new spkgs on a loaded OS X system. I could not reproduce any failures in the quick test suite, but the old spkg reliably failed its full test-suite, while the new spkg reliably passed its full test suite. klee, can you test it out, too? |
comment:48
Interim report:
I will report the final result as soon as the machin finishes building! By the way, thank you all so much. |
comment:49
Report on making Sage 5.10.beta2: Built successfully. Tested successfully except one failure, which seems unrelated with the current issue.
|
comment:50
Yes, this is a very occasional OS X error that I haven't been able to track down, and that has nothing to do with this ticket. |
comment:51
Report on making Sage 5.10.beta4: Well... Building failed with "Error installing package sage-5.10.beta4". So I tried "./sage -i spkg/standard/zn_poly-0.9.p11.spkg", and it was installed successfully. So my overall impression is that the patch corrects the issue, and the issue seems unrelated with the heavy loadedness of my machine (Mac Pro quad-core intel xeon with Mac OS X 10.7.5). |
comment:52
But what was the failure in installing beta4? If it was still To really test this, assuming the failure was |
comment:53
Replying to @kcrisman:
package On the other hand, (re)installing zn_poly afterwards (with just (But you said you copied the |
comment:54
... where "you" addresses Kwankyu, in case that wasn't clear. |
comment:55
Replying to @nexttime:
Yes, I copied Sorry that I don't remember the reason of the failure of beta4. The message was somewhat unclear to me, but seemed unrelated with zn_poly. Now I am building beta4 to reproduce the failure. I used "sage -i" rather than "sage -f", and remember the installation of the spkg started as if it was not done before. On this point, I am not so confident of my own memory though. Anyway, the installation was successful. |
comment:56
Rebuilding beta4 now succeeded, but when I started the just-built Sage, I got
Still "./sage -f spkg/standard/zn_poly-0.9.p11.spkg" succeeds. |
comment:57
Replying to @kwankyu:
This is both unrelated to zn_poly and hardly related to Sage 5.10.beta4. Outdated |
comment:58
Replying to @nexttime:
P.S.: The relevant "layout" change was announced (or suggested) on sage-devel a while ago. |
comment:59
At least this spkg fixes some bug, so it's good to have. |
Merged: sage-5.10.beta5 |
Reviewer: Jeroen Demeyer |
comment:60
True! But did you open a new ticket for the original bug, which is probably not resolved by this? (JP, I assume that on a loaded Cygwin system we still get the original issue.) |
comment:61
I successfully installed Sage-5.10.rc0 without the zn_poly failure issue. (the error after starting Sage as reported in a previous comment was just because of my own out-dated scripts, and is irrelevant with this ticket. Sorry for the noise.) Thanks a lot! |
See #13137 for more info.
This is true with different versions of MPIR so seems to be because of zn_poly and not of MPIR.
No problems where spotted on Linuces.
New spkg: http://boxen.math.washington.edu/home/leif/Sage/spkgs/zn_poly-0.9.p11.spkg
md5sum:
012e63d181151c19ddc71bdfaeb14e03 zn_poly-0.9.p11.spkg
zn_poly-0.9.p11 (Leif Leonhardy, May 24th, 2013)
nuss_mul()
test failing especially if tuning happenedunder "heavy" load (at least on MacOS X and Cygwin)
Add
fix_fudge_factor_in_nuss-test.c.patch
; fix suggested by DavidHarvey.
CC: @nexttime @jhpalmieri @jdemeyer @kcrisman @kwankyu
Component: packages: standard
Keywords: zn_poly spkg cygwin osx nuss_mul fail
Author: Leif Leonhardy
Reviewer: Jeroen Demeyer
Merged: sage-5.10.beta5
Issue created by migration from https://trac.sagemath.org/ticket/13947
The text was updated successfully, but these errors were encountered: