Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make stats doctests ready for random seeds #29972

Closed
kliem opened this issue Jun 24, 2020 · 33 comments
Closed

Make stats doctests ready for random seeds #29972

kliem opened this issue Jun 24, 2020 · 33 comments

Comments

@kliem
Copy link
Contributor

kliem commented Jun 24, 2020

Part of #29935.

This ticket makes

sage -t --long --random-seed=n src/sage/stats/

pass for n more general than just 0.

CC: @slel

Component: doctest framework

Author: Jonathan Kliem

Branch/Commit: b0c02b3

Reviewer: Samuel Lelièvre, Markus Wageringel

Issue created by migration from https://trac.sagemath.org/ticket/29972

@kliem kliem added this to the sage-9.2 milestone Jun 24, 2020
@kliem
Copy link
Contributor Author

kliem commented Jun 24, 2020

comment:1

At least the following need fixing:

sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 7 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_lattice.py  # 9 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_polynomial.py  # 6 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/chmm.pyx  # 8 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/distributions.pyx  # 3 doctests failed

@mkoeppe mkoeppe modified the milestones: sage-9.2, sage-9.3 Sep 5, 2020
@mkoeppe
Copy link
Contributor

mkoeppe commented Mar 15, 2021

comment:3

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

@mkoeppe mkoeppe modified the milestones: sage-9.3, sage-9.4 Mar 15, 2021
@kliem
Copy link
Contributor Author

kliem commented Apr 29, 2021

Branch: public/29972

@kliem
Copy link
Contributor Author

kliem commented Apr 29, 2021

Commit: d96e5c3

@kliem
Copy link
Contributor Author

kliem commented Apr 29, 2021

New commits:

d96e5c3make stats ready for random seeds

@kliem
Copy link
Contributor Author

kliem commented Apr 29, 2021

Changed dependencies from #29962 to none

@kliem
Copy link
Contributor Author

kliem commented Apr 29, 2021

Author: Jonathan Kliem

@mwageringel
Copy link
Contributor

comment:5

I have tried 6 different seeds and got one failure:

sage -t --long --warn-long 134.7 --random-seed=2002 src/sage/stats/distributions/discrete_gaussian_lattice.py
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_lattice.py", line 210, in sage.stats.distributions.discrete_gaussian_lattice.DiscreteGaussianDistributionLatticeSampler._normalisation_factor_zz
Failed example:
    l.count(v)  # abs tol 20
Expected:
    64
Got:
    87
Tolerance exceeded:
    64 vs 87, tolerance 3e1 > 2e1

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 29, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

a15025dmore tolerance

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 29, 2021

Changed commit from d96e5c3 to a15025d

@kliem
Copy link
Contributor Author

kliem commented Apr 29, 2021

comment:7

Ok, I made it a bit more flexible.

These doctests are really difficult to fix. As you noticed, it is far from stable.

@slel
Copy link
Member

slel commented Apr 29, 2021

comment:8

You replaced:

-    sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (13355, 13298)
-    sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (5479, 5467)
-    sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (53, 51)
+    sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    13298
+    sage: l.count(x)  # rel tol 5e-2
+    13298
+    sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    5467
+    sage: l.count(x)  # rel tol 5e-2
+    5467
+    sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    51
+    sage: l.count(x)  # rel tol 5e-1
+    51

I would further rewrite that as:

+    sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    sage: observed = lambda x: l.count(x)
+    sage: expected(0)
+    13298
+    sage: observed(0)  # rel tol 5e-2
+    13298
+    sage: expected(4)
+    5467
+    sage: observed(4)  # rel tol 5e-2
+    5467
+    sage: expected(-10)
+    51
+    sage: observed(-10)  # rel tol 5e-1
+    51

You replaced:

-    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.0, 1.00...)
-    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.32..., 1.36...)
+    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.36, 1.36)

I would further rewrite that as:

+    sage: expected = lambda x, y: (
+    ....:     exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+    sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+    sage: expected(0, 1), observed(0, 1)  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: expected(0, -100), observed(0, -100)  # long time  # abs tol 2e-1
+    (1.36, 1.36)

In src/sage/stats/hmm/chmm.pyx you add set_random_seed(0)
in front of some tests.

Is there an explanation somewhere of when to do that?
I did not find one at #29935.

@slel

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 30, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

fb3fab0some changes suggested by reviewer
f06cc7cremove set_random_seed(0)

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 30, 2021

Changed commit from a15025d to f06cc7c

@kliem
Copy link
Contributor Author

kliem commented Apr 30, 2021

comment:10

Replying to @slel:

You replaced:

-    sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (13355, 13298)
-    sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (5479, 5467)
-    sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (53, 51)
+    sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    13298
+    sage: l.count(x)  # rel tol 5e-2
+    13298
+    sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    5467
+    sage: l.count(x)  # rel tol 5e-2
+    5467
+    sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    51
+    sage: l.count(x)  # rel tol 5e-1
+    51

I would further rewrite that as:

+    sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    sage: observed = lambda x: l.count(x)
+    sage: expected(0)
+    13298
+    sage: observed(0)  # rel tol 5e-2
+    13298
+    sage: expected(4)
+    5467
+    sage: observed(4)  # rel tol 5e-2
+    5467
+    sage: expected(-10)
+    51
+    sage: observed(-10)  # rel tol 5e-1
+    51

You replaced:

-    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.0, 1.00...)
-    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.32..., 1.36...)
+    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.36, 1.36)

I would further rewrite that as:

+    sage: expected = lambda x, y: (
+    ....:     exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+    sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+    sage: expected(0, 1), observed(0, 1)  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: expected(0, -100), observed(0, -100)  # long time  # abs tol 2e-1
+    (1.36, 1.36)

In src/sage/stats/hmm/chmm.pyx you add set_random_seed(0)
in front of some tests.

Is there an explanation somewhere of when to do that?
I did not find one at #29935.

Thanks for the above suggestions.

Yes, we agreed to avoid set_random_seed(0) if at all possible.
It is harder to fix this for doctests that you don't understand.
I removed this now.

@mwageringel
Copy link
Contributor

comment:11

This still fails too frequently (8 times out of 50). I understand the difficulty in making this work and I do not have a good solution either. Personally, I have the impression that we should not write tests that are guaranteed to fail sporadically at all, especially considering the vast number of doctests in Sage. Every false positive adds noise to the development process, which can make real issues go unnoticed. Besides that, including a test in the documentation making false claims about a probability distribution is a bit confusing.

Setting the seed to 0 may be the only reliable solution, but I welcome other ideas and opinions. Or we just increase the tolerances enough for the tests to pass (almost) always.

When I was running a patchbot client, I fixed several flaky doctests just to get the patchbot to report usable results, which is a time consuming process. That is why I am wary about introducing new such issues.

sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13016 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13032 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13034 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13042 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13047 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13049 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed

@slel
Copy link
Member

slel commented Apr 30, 2021

comment:12

I agree. See a similar discussion at

@kliem
Copy link
Contributor Author

kliem commented May 1, 2021

comment:13

Yes, 8 out of 50 is definitely too much.

My idea was to have meaningful tests that fail about 1 in 10000 or less. That is how I did the tests in #29976. Run the corresponding function a couple thousand times and take the maximum and minimum.

E.g. there is a doctest in matrices that tests that random GF(2) 1000 x 1000 matrices have high rank (because that was an issue that has been fixed at some point). Now theoretically, it could still have a very low rank, but I don't think this will happen. The new doctest is rather meaningful, I think, because it documents that such a matrix will very probably (at least 99.9%) have rank 995 or higher. Also the doctest now is actually being run. It doesn't just tell us that for random_seed(0) things work out, but that things usually work out.

@kliem
Copy link
Contributor Author

kliem commented May 1, 2021

comment:14

Here is an example of something I feel comfortable with:

sage: def foo(): 
....:     A = random_matrix(GF(2), 1000, 1000) 
....:     return float(A.density()) 
....:                                                                                                                                                                               
sage: max(foo() for _ in range(100))                                                                                                                                                
0.50165
sage: max(foo() for _ in range(1000))                                                                                                                                               
0.501586

Then I decided that a tolerance of 0.01 to make it almost never fail.
Of course there should

             sage: A = matrix(GF(2), 5, 5, 0)
-            sage: A.randomize(0.5); A
-            [0 0 0 1 1]
-            [0 1 0 0 1]
-            [1 0 0 0 0]
-            [0 1 0 0 0]
-            [0 0 0 1 0]
-            sage: A.randomize(); A
-            [0 0 1 1 0]
-            [1 1 0 0 1]
-            [1 1 1 1 0]
-            [1 1 1 1 1]
-            [0 0 1 1 0]
+            sage: len(A.nonzero_positions())
+            0
+            sage: A.randomize(0.5)
+            sage: 0 <= len(A.nonzero_positions()) < 12
+            True
+            sage: A.randomize()
+            sage: 1 <= len(A.nonzero_positions()) < 24
+            True

Here the last one fails apparently something like 1 in a million. Lets say there are thousand tests like this in sage and it takes 2 hours for a patch bot to run all tests. Then apparently every 83 days your patchbot will report such an error. Is this ok?
Of course it will only die, if this happens after a beta. This will happen something like every twenty years.

Maybe tests failing 1 in a 100 000 are still acceptable. But everything significantly below that would be annoying. You don't want to restart your patchbot every 50 days, because of stupid failures.

And of course, if the test becomes meaningless by that kind of tolerance, we might want to go with set_random_seed(0).

@kliem
Copy link
Contributor Author

kliem commented May 1, 2021

comment:15

Maybe the far more easy and reasonable solution is to have the patchbots run at random seed 0, when they verify that they are not broken. I mean this is our reference seed. We can life with bots reporting once in a while incorrect failures (it happens anyway).

So we could at this to the patchbot scripts?

@kliem
Copy link
Contributor Author

kliem commented May 1, 2021

comment:16

Sorry for all my comments.

Just forget what I said earlier. I still think that tests with set_random_seed(0) are pretty worthless, because they don't test anything.

However, doctests should be mathematically correct. So rethinking:

             sage: A = matrix(GF(2), 5, 5, 0)
             sage: A.randomize()
             sage: 1 <= len(A.nonzero_positions()) < 24
             True

is NOT a good test. Not because it is likely to fail, but because it boils down to something as:

sage: randint(0, 2^24) != 0
True

It probably won't fail, but it is terrible.

There is two reasons for doctests that I think should be kept in mind

  • educational/documentation
  • actually testing that things work (e.g. that random really does something).

Here is my proposition:

I think the way to combine those things is using while statements in those tests for "randomness". Those tests succeed almost certainly in a mathematical correct way:

  • Testing that not all random elements are zero
sage: while GF(3).random_element().is_zero(): pass
  • Testing that all (of finitely many) elements can be obtained eventually.
  • Testing that random matrices over GF(2) have different ranks:
sage: def random_rank():
....:     A = matrix(GF(2), 5, 5, 0) 
....:     A.randomize() 
....:     return A.rank()
sage: while random_rank() < 5: pass
while while random_rank() >= 4: pass
  • Testing that according to the law of large numbers an expectation is approximated at some point.
sage: from collections import defaultdict                                                                                             
sage: counter = defaultdict(Integer)                                                                                                  
sage: def throw_dice(): counter[randint(1, 6)] += 1                                                                                   
sage: throw_dice()                                                                                                                    
sage: def test(): return all((counter[i]*1.0/sum(counter.values()) - 1/6) < 0.01 for i in counter)                                    
sage: while not test(): inc() 
  • Applying this to our situation this could look like:
sage: from sage.stats.distributions.discrete_gaussian_integer import DiscreteGaussianDistributionIntegerSampler   
sage: sigma = 3.0                                                                                                                     
sage: D = DiscreteGaussianDistributionIntegerSampler(sigma=sigma)                                                                     
sage: bound = (6*sigma).floor()                                                                                                       
sage: norm_factor = sum([exp(-x^2/(2*sigma^2)) for x in range(-bound,bound+1)]) 
                                                      
sage: from collections import defaultdict                                                                                             
sage: counter = defaultdict(Integer)  
sage: n = 0                                                                                                 
sage: def add_sample(): 
....:     global counter, n 
....:     counter[D()] += 1 
....:     n += 1 
....:                                                                                                                                 
sage: for _ in range(1000): add_sample()                                                                                                                                                                                  
sage: expected = lambda x : ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))                                                            
sage: observed = lambda x : counter[x]
                                                                                                
sage: while not counter[0]: add_sample()                                                                                              
sage: while abs(expected(0)*1.0/observed(0) - 1.0) > 5e-2: add_sample()  # long time  
                                                             
sage: while not counter[4]: add_sample()                                                                     
sage: while abs(expected(4)*1.0/observed(4) - 1.0) > 5e-2: add_sample()  # long time     
                                                          
sage: while not counter[-10]: add_sample()                                                                                            
sage: while abs(expected(-10)*1.0/observed(-10) - 1.0) > 5e-2: add_sample()  # long time                          

Am I making any sense?

IMO this would actually be a valuable doctest and mathematically correct. Because the probability of this not terminating is indeed a zero probability (at least if things are implemented correctly).

One can maybe speed this up by increasing the initial sample and by adding a number of samples instead of just one at a time.

@mwageringel
Copy link
Contributor

comment:17

Yes, these are very good suggestions. Thanks for the detailed proposal.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2021

Changed commit from f06cc7c to 90ed4c6

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

90ed4c6replace non-safe tests by correct tests that terminate by law of large numbers

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

d17384bfix fishy doctest in libs

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2021

Changed commit from 90ed4c6 to d17384b

@kliem
Copy link
Contributor Author

kliem commented May 4, 2021

comment:21

I could only localize such a doctest in one other closed ticket and decided to fix it here as well.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2021

Changed commit from d17384b to b0c02b3

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 4, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

b0c02b3more meaningful replacement in libs

@mwageringel
Copy link
Contributor

Reviewer: Samuel Lelièvre, Markus Wageringel

@mwageringel
Copy link
Contributor

comment:23

Very nice. This works well.

I have tried the while loops manually to make sure they do not take too much time. Only this test in discrete_gaussian_lattice is sometimes a bit slow. This only happens rarely, but in one case it created more than 2 million samples.

sage: while abs(m*f(v)*1.0/c/counter[v] - 1.0) >= 0.2: add_samples(1000)  # long time

As this does not happen often, I think we can set this ticket to positive review, but you can change the bound of this test if you prefer.

@vbraun
Copy link
Member

vbraun commented May 27, 2021

Changed branch from public/29972 to b0c02b3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants