-
-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make stats doctests ready for random seeds #29972
Comments
comment:1
At least the following need fixing:
|
comment:3
Setting new milestone based on a cursory review of ticket status, priority, and last modification date. |
Branch: public/29972 |
Commit: |
New commits:
|
Changed dependencies from #29962 to none |
Author: Jonathan Kliem |
comment:5
I have tried 6 different seeds and got one failure:
|
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:7
Ok, I made it a bit more flexible. These doctests are really difficult to fix. As you noticed, it is far from stable. |
comment:8
You replaced: - sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
- (13355, 13298)
- sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
- (5479, 5467)
- sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
- (53, 51)
+ sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ 13298
+ sage: l.count(x) # rel tol 5e-2
+ 13298
+ sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ 5467
+ sage: l.count(x) # rel tol 5e-2
+ 5467
+ sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ 51
+ sage: l.count(x) # rel tol 5e-1
+ 51 I would further rewrite that as: + sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+ sage: observed = lambda x: l.count(x)
+ sage: expected(0)
+ 13298
+ sage: observed(0) # rel tol 5e-2
+ 13298
+ sage: expected(4)
+ 5467
+ sage: observed(4) # rel tol 5e-2
+ 5467
+ sage: expected(-10)
+ 51
+ sage: observed(-10) # rel tol 5e-1
+ 51 You replaced: - sage: x=0; y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
- (1.0, 1.00...)
- sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
- (1.32..., 1.36...)
+ sage: x=0; y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time # abs tol 2e-1
+ (1.0, 1.0)
+ sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time # abs tol 2e-1
+ (1.36, 1.36) I would further rewrite that as: + sage: expected = lambda x, y: (
+ ....: exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+ sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+ sage: expected(0, 1), observed(0, 1) # long time # abs tol 2e-1
+ (1.0, 1.0)
+ sage: expected(0, -100), observed(0, -100) # long time # abs tol 2e-1
+ (1.36, 1.36) In Is there an explanation somewhere of when to do that? |
This comment has been minimized.
This comment has been minimized.
comment:10
Replying to @slel:
Thanks for the above suggestions. Yes, we agreed to avoid |
comment:11
This still fails too frequently (8 times out of 50). I understand the difficulty in making this work and I do not have a good solution either. Personally, I have the impression that we should not write tests that are guaranteed to fail sporadically at all, especially considering the vast number of doctests in Sage. Every false positive adds noise to the development process, which can make real issues go unnoticed. Besides that, including a test in the documentation making false claims about a probability distribution is a bit confusing. Setting the seed to 0 may be the only reliable solution, but I welcome other ideas and opinions. Or we just increase the tolerances enough for the tests to pass (almost) always. When I was running a patchbot client, I fixed several flaky doctests just to get the patchbot to report usable results, which is a time consuming process. That is why I am wary about introducing new such issues.
|
comment:12
I agree. See a similar discussion at |
comment:13
Yes, 8 out of 50 is definitely too much. My idea was to have meaningful tests that fail about 1 in 10000 or less. That is how I did the tests in #29976. Run the corresponding function a couple thousand times and take the maximum and minimum. E.g. there is a doctest in matrices that tests that random |
comment:14
Here is an example of something I feel comfortable with:
Then I decided that a tolerance of sage: A = matrix(GF(2), 5, 5, 0)
- sage: A.randomize(0.5); A
- [0 0 0 1 1]
- [0 1 0 0 1]
- [1 0 0 0 0]
- [0 1 0 0 0]
- [0 0 0 1 0]
- sage: A.randomize(); A
- [0 0 1 1 0]
- [1 1 0 0 1]
- [1 1 1 1 0]
- [1 1 1 1 1]
- [0 0 1 1 0]
+ sage: len(A.nonzero_positions())
+ 0
+ sage: A.randomize(0.5)
+ sage: 0 <= len(A.nonzero_positions()) < 12
+ True
+ sage: A.randomize()
+ sage: 1 <= len(A.nonzero_positions()) < 24
+ True Here the last one fails apparently something like 1 in a million. Lets say there are thousand tests like this in sage and it takes 2 hours for a patch bot to run all tests. Then apparently every 83 days your patchbot will report such an error. Is this ok? Maybe tests failing 1 in a 100 000 are still acceptable. But everything significantly below that would be annoying. You don't want to restart your patchbot every 50 days, because of stupid failures. And of course, if the test becomes meaningless by that kind of tolerance, we might want to go with |
comment:15
Maybe the far more easy and reasonable solution is to have the patchbots run at random seed 0, when they verify that they are not broken. I mean this is our reference seed. We can life with bots reporting once in a while incorrect failures (it happens anyway). So we could at this to the patchbot scripts? |
comment:16
Sorry for all my comments. Just forget what I said earlier. I still think that tests with However, doctests should be mathematically correct. So rethinking:
is NOT a good test. Not because it is likely to fail, but because it boils down to something as:
It probably won't fail, but it is terrible. There is two reasons for doctests that I think should be kept in mind
Here is my proposition: I think the way to combine those things is using while statements in those tests for "randomness". Those tests succeed almost certainly in a mathematical correct way:
Am I making any sense? IMO this would actually be a valuable doctest and mathematically correct. Because the probability of this not terminating is indeed a zero probability (at least if things are implemented correctly). One can maybe speed this up by increasing the initial sample and by adding a number of samples instead of just one at a time. |
comment:17
Yes, these are very good suggestions. Thanks for the detailed proposal. |
Branch pushed to git repo; I updated commit sha1. New commits:
|
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:21
I could only localize such a doctest in one other closed ticket and decided to fix it here as well. |
Branch pushed to git repo; I updated commit sha1. New commits:
|
Reviewer: Samuel Lelièvre, Markus Wageringel |
comment:23
Very nice. This works well. I have tried the while loops manually to make sure they do not take too much time. Only this test in
As this does not happen often, I think we can set this ticket to positive review, but you can change the bound of this test if you prefer. |
Changed branch from public/29972 to |
Part of #29935.
This ticket makes
pass for
n
more general than just0
.CC: @slel
Component: doctest framework
Author: Jonathan Kliem
Branch/Commit:
b0c02b3
Reviewer: Samuel Lelièvre, Markus Wageringel
Issue created by migration from https://trac.sagemath.org/ticket/29972
The text was updated successfully, but these errors were encountered: