Make stats doctests ready for random seeds #29972

kliem · 2020-06-24T20:44:25Z

Part of #29935.

This ticket makes

sage -t --long --random-seed=n src/sage/stats/

pass for n more general than just 0.

CC: @slel

Component: doctest framework

Author: Jonathan Kliem

Branch/Commit: b0c02b3

Reviewer: Samuel Lelièvre, Markus Wageringel

Issue created by migration from https://trac.sagemath.org/ticket/29972

The text was updated successfully, but these errors were encountered:

kliem · 2020-06-24T20:44:55Z

comment:1

At least the following need fixing:

sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 7 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_lattice.py  # 9 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/distributions/discrete_gaussian_polynomial.py  # 6 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/chmm.pyx  # 8 doctests failed
sage -t --long --random-seed=151058820726654196682836430928254760259 src/sage/stats/hmm/distributions.pyx  # 3 doctests failed

mkoeppe · 2021-03-15T22:07:04Z

comment:3

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

kliem · 2021-04-29T12:41:10Z

Branch: public/29972

kliem · 2021-04-29T12:41:10Z

Commit: d96e5c3

kliem · 2021-04-29T12:41:10Z

New commits:

`d96e5c3`	`make stats ready for random seeds`

kliem · 2021-04-29T12:41:10Z

Changed dependencies from #29962 to none

kliem · 2021-04-29T12:41:10Z

Author: Jonathan Kliem

mwageringel · 2021-04-29T18:35:37Z

comment:5

I have tried 6 different seeds and got one failure:

sage -t --long --warn-long 134.7 --random-seed=2002 src/sage/stats/distributions/discrete_gaussian_lattice.py
**********************************************************************
File "src/sage/stats/distributions/discrete_gaussian_lattice.py", line 210, in sage.stats.distributions.discrete_gaussian_lattice.DiscreteGaussianDistributionLatticeSampler._normalisation_factor_zz
Failed example:
    l.count(v)  # abs tol 20
Expected:
    64
Got:
    87
Tolerance exceeded:
    64 vs 87, tolerance 3e1 > 2e1

sagetrac-git · 2021-04-29T20:38:59Z

Branch pushed to git repo; I updated commit sha1. New commits:

`a15025d`	`more tolerance`

sagetrac-git · 2021-04-29T20:38:59Z

Changed commit from d96e5c3 to a15025d

kliem · 2021-04-29T20:39:52Z

comment:7

Ok, I made it a bit more flexible.

These doctests are really difficult to fix. As you noticed, it is far from stable.

slel · 2021-04-29T22:09:38Z

comment:8

You replaced:

-    sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (13355, 13298)
-    sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (5479, 5467)
-    sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (53, 51)
+    sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    13298
+    sage: l.count(x)  # rel tol 5e-2
+    13298
+    sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    5467
+    sage: l.count(x)  # rel tol 5e-2
+    5467
+    sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    51
+    sage: l.count(x)  # rel tol 5e-1
+    51

I would further rewrite that as:

+    sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    sage: observed = lambda x: l.count(x)
+    sage: expected(0)
+    13298
+    sage: observed(0)  # rel tol 5e-2
+    13298
+    sage: expected(4)
+    5467
+    sage: observed(4)  # rel tol 5e-2
+    5467
+    sage: expected(-10)
+    51
+    sage: observed(-10)  # rel tol 5e-1
+    51

You replaced:

-    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.0, 1.00...)
-    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.32..., 1.36...)
+    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.36, 1.36)

I would further rewrite that as:

+    sage: expected = lambda x, y: (
+    ....:     exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+    sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+    sage: expected(0, 1), observed(0, 1)  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: expected(0, -100), observed(0, -100)  # long time  # abs tol 2e-1
+    (1.36, 1.36)

In src/sage/stats/hmm/chmm.pyx you add set_random_seed(0)
in front of some tests.

Is there an explanation somewhere of when to do that?
I did not find one at #29935.

sagetrac-git · 2021-04-30T08:13:53Z

Branch pushed to git repo; I updated commit sha1. New commits:

`fb3fab0`	`some changes suggested by reviewer`
`f06cc7c`	`remove set_random_seed(0)`

sagetrac-git · 2021-04-30T08:13:53Z

Changed commit from a15025d to f06cc7c

kliem · 2021-04-30T08:20:30Z

comment:10

Replying to @slel:

You replaced:

-    sage: x=0; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (13355, 13298)
-    sage: x=4; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (5479, 5467)
-    sage: x=-10; l.count(x), ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
-    (53, 51)
+    sage: x=0; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    13298
+    sage: l.count(x)  # rel tol 5e-2
+    13298
+    sage: x=4; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    5467
+    sage: l.count(x)  # rel tol 5e-2
+    5467
+    sage: x=-10; ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    51
+    sage: l.count(x)  # rel tol 5e-1
+    51

I would further rewrite that as:

+    sage: expected = lambda x: ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))
+    sage: observed = lambda x: l.count(x)
+    sage: expected(0)
+    13298
+    sage: observed(0)  # rel tol 5e-2
+    13298
+    sage: expected(4)
+    5467
+    sage: observed(4)  # rel tol 5e-2
+    5467
+    sage: expected(-10)
+    51
+    sage: observed(-10)  # rel tol 5e-1
+    51

You replaced:

-    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.0, 1.00...)
-    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n() # long time
-    (1.32..., 1.36...)
+    sage: x=0;   y=1; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: x=0; y=-100; float(l.count(x))/l.count(y), exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n()  # long time  # abs tol 2e-1
+    (1.36, 1.36)

I would further rewrite that as:

+    sage: expected = lambda x, y: (
+    ....:     exp(-x^2/(2*sigma^2))/exp(-y^2/(2*sigma^2)).n())
+    sage: observed = lambda x, y: float(l.count(x))/l.count(y)
+    sage: expected(0, 1), observed(0, 1)  # long time  # abs tol 2e-1
+    (1.0, 1.0)
+    sage: expected(0, -100), observed(0, -100)  # long time  # abs tol 2e-1
+    (1.36, 1.36)

In src/sage/stats/hmm/chmm.pyx you add set_random_seed(0)
in front of some tests.

Is there an explanation somewhere of when to do that?
I did not find one at #29935.

Thanks for the above suggestions.

Yes, we agreed to avoid set_random_seed(0) if at all possible.
It is harder to fix this for doctests that you don't understand.
I removed this now.

mwageringel · 2021-04-30T20:26:09Z

comment:11

This still fails too frequently (8 times out of 50). I understand the difficulty in making this work and I do not have a good solution either. Personally, I have the impression that we should not write tests that are guaranteed to fail sporadically at all, especially considering the vast number of doctests in Sage. Every false positive adds noise to the development process, which can make real issues go unnoticed. Besides that, including a test in the documentation making false claims about a probability distribution is a bit confusing.

Setting the seed to 0 may be the only reliable solution, but I welcome other ideas and opinions. Or we just increase the tolerances enough for the tests to pass (almost) always.

When I was running a patchbot client, I fixed several flaky doctests just to get the patchbot to report usable results, which is a time consuming process. That is why I am wary about introducing new such issues.

sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13008 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13016 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13032 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13034 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13042 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13047 src/sage/stats/hmm/chmm.pyx  # 1 doctest failed
sage -t --long --warn-long 134.9 --random-seed=13049 src/sage/stats/distributions/discrete_gaussian_integer.pyx  # 1 doctest failed

slel · 2021-04-30T20:42:16Z

comment:12

I agree. See a similar discussion at

Stack Overflow question 641318: Test probabilistic functions

kliem · 2021-05-01T00:44:01Z

comment:13

Yes, 8 out of 50 is definitely too much.

My idea was to have meaningful tests that fail about 1 in 10000 or less. That is how I did the tests in #29976. Run the corresponding function a couple thousand times and take the maximum and minimum.

E.g. there is a doctest in matrices that tests that random GF(2) 1000 x 1000 matrices have high rank (because that was an issue that has been fixed at some point). Now theoretically, it could still have a very low rank, but I don't think this will happen. The new doctest is rather meaningful, I think, because it documents that such a matrix will very probably (at least 99.9%) have rank 995 or higher. Also the doctest now is actually being run. It doesn't just tell us that for random_seed(0) things work out, but that things usually work out.

kliem · 2021-05-01T01:09:51Z

comment:14

Here is an example of something I feel comfortable with:

sage: def foo(): 
....:     A = random_matrix(GF(2), 1000, 1000) 
....:     return float(A.density()) 
....:                                                                                                                                                                               
sage: max(foo() for _ in range(100))                                                                                                                                                
0.50165
sage: max(foo() for _ in range(1000))                                                                                                                                               
0.501586

Then I decided that a tolerance of 0.01 to make it almost never fail.
Of course there should

             sage: A = matrix(GF(2), 5, 5, 0)
-            sage: A.randomize(0.5); A
-            [0 0 0 1 1]
-            [0 1 0 0 1]
-            [1 0 0 0 0]
-            [0 1 0 0 0]
-            [0 0 0 1 0]
-            sage: A.randomize(); A
-            [0 0 1 1 0]
-            [1 1 0 0 1]
-            [1 1 1 1 0]
-            [1 1 1 1 1]
-            [0 0 1 1 0]
+            sage: len(A.nonzero_positions())
+            0
+            sage: A.randomize(0.5)
+            sage: 0 <= len(A.nonzero_positions()) < 12
+            True
+            sage: A.randomize()
+            sage: 1 <= len(A.nonzero_positions()) < 24
+            True

Here the last one fails apparently something like 1 in a million. Lets say there are thousand tests like this in sage and it takes 2 hours for a patch bot to run all tests. Then apparently every 83 days your patchbot will report such an error. Is this ok?
Of course it will only die, if this happens after a beta. This will happen something like every twenty years.

Maybe tests failing 1 in a 100 000 are still acceptable. But everything significantly below that would be annoying. You don't want to restart your patchbot every 50 days, because of stupid failures.

And of course, if the test becomes meaningless by that kind of tolerance, we might want to go with set_random_seed(0).

kliem · 2021-05-01T01:16:24Z

comment:15

Maybe the far more easy and reasonable solution is to have the patchbots run at random seed 0, when they verify that they are not broken. I mean this is our reference seed. We can life with bots reporting once in a while incorrect failures (it happens anyway).

So we could at this to the patchbot scripts?

kliem · 2021-05-01T09:00:02Z

comment:16

Sorry for all my comments.

Just forget what I said earlier. I still think that tests with set_random_seed(0) are pretty worthless, because they don't test anything.

However, doctests should be mathematically correct. So rethinking:

             sage: A = matrix(GF(2), 5, 5, 0)
             sage: A.randomize()
             sage: 1 <= len(A.nonzero_positions()) < 24
             True

is NOT a good test. Not because it is likely to fail, but because it boils down to something as:

sage: randint(0, 2^24) != 0
True

It probably won't fail, but it is terrible.

There is two reasons for doctests that I think should be kept in mind

educational/documentation
actually testing that things work (e.g. that random really does something).

Here is my proposition:

I think the way to combine those things is using while statements in those tests for "randomness". Those tests succeed almost certainly in a mathematical correct way:

Testing that not all random elements are zero

sage: while GF(3).random_element().is_zero(): pass

Testing that all (of finitely many) elements can be obtained eventually.
Testing that random matrices over GF(2) have different ranks:

sage: def random_rank():
....:     A = matrix(GF(2), 5, 5, 0) 
....:     A.randomize() 
....:     return A.rank()
sage: while random_rank() < 5: pass
while while random_rank() >= 4: pass

Testing that according to the law of large numbers an expectation is approximated at some point.

sage: from collections import defaultdict                                                                                             
sage: counter = defaultdict(Integer)                                                                                                  
sage: def throw_dice(): counter[randint(1, 6)] += 1                                                                                   
sage: throw_dice()                                                                                                                    
sage: def test(): return all((counter[i]*1.0/sum(counter.values()) - 1/6) < 0.01 for i in counter)                                    
sage: while not test(): inc()

Applying this to our situation this could look like:

sage: from sage.stats.distributions.discrete_gaussian_integer import DiscreteGaussianDistributionIntegerSampler   
sage: sigma = 3.0                                                                                                                     
sage: D = DiscreteGaussianDistributionIntegerSampler(sigma=sigma)                                                                     
sage: bound = (6*sigma).floor()                                                                                                       
sage: norm_factor = sum([exp(-x^2/(2*sigma^2)) for x in range(-bound,bound+1)]) 
                                                      
sage: from collections import defaultdict                                                                                             
sage: counter = defaultdict(Integer)  
sage: n = 0                                                                                                 
sage: def add_sample(): 
....:     global counter, n 
....:     counter[D()] += 1 
....:     n += 1 
....:                                                                                                                                 
sage: for _ in range(1000): add_sample()                                                                                                                                                                                  
sage: expected = lambda x : ZZ(round(n*exp(-x^2/(2*sigma^2))/norm_factor))                                                            
sage: observed = lambda x : counter[x]
                                                                                                
sage: while not counter[0]: add_sample()                                                                                              
sage: while abs(expected(0)*1.0/observed(0) - 1.0) > 5e-2: add_sample()  # long time  
                                                             
sage: while not counter[4]: add_sample()                                                                     
sage: while abs(expected(4)*1.0/observed(4) - 1.0) > 5e-2: add_sample()  # long time     
                                                          
sage: while not counter[-10]: add_sample()                                                                                            
sage: while abs(expected(-10)*1.0/observed(-10) - 1.0) > 5e-2: add_sample()  # long time

Am I making any sense?

IMO this would actually be a valuable doctest and mathematically correct. Because the probability of this not terminating is indeed a zero probability (at least if things are implemented correctly).

One can maybe speed this up by increasing the initial sample and by adding a number of samples instead of just one at a time.

mwageringel · 2021-05-01T14:06:35Z

comment:17

Yes, these are very good suggestions. Thanks for the detailed proposal.

sagetrac-git · 2021-05-04T09:52:00Z

Changed commit from f06cc7c to 90ed4c6

sagetrac-git · 2021-05-04T09:52:00Z

Branch pushed to git repo; I updated commit sha1. New commits:

`90ed4c6`	`replace non-safe tests by correct tests that terminate by law of large numbers`

sagetrac-git · 2021-05-04T10:16:14Z

Branch pushed to git repo; I updated commit sha1. New commits:

`d17384b`	`fix fishy doctest in libs`

sagetrac-git · 2021-05-04T10:16:14Z

Changed commit from 90ed4c6 to d17384b

kliem · 2021-05-04T10:17:07Z

comment:21

I could only localize such a doctest in one other closed ticket and decided to fix it here as well.

sagetrac-git · 2021-05-04T10:45:07Z

Changed commit from d17384b to b0c02b3

sagetrac-git · 2021-05-04T10:45:07Z

Branch pushed to git repo; I updated commit sha1. New commits:

`b0c02b3`	`more meaningful replacement in libs`

mwageringel · 2021-05-08T18:44:54Z

Reviewer: Samuel Lelièvre, Markus Wageringel

mwageringel · 2021-05-08T18:44:54Z

comment:23

Very nice. This works well.

I have tried the while loops manually to make sure they do not take too much time. Only this test in discrete_gaussian_lattice is sometimes a bit slow. This only happens rarely, but in one case it created more than 2 million samples.

sage: while abs(m*f(v)*1.0/c/counter[v] - 1.0) >= 0.2: add_samples(1000)  # long time

As this does not happen often, I think we can set this ticket to positive review, but you can change the bound of this test if you prefer.

vbraun · 2021-05-27T20:30:16Z

Changed branch from public/29972 to b0c02b3

kliem added this to the sage-9.2 milestone Jun 24, 2020

kliem added c: doctest framework labels Jun 24, 2020

mkoeppe modified the milestones: sage-9.2, sage-9.3 Sep 5, 2020

mkoeppe modified the milestones: sage-9.3, sage-9.4 Mar 15, 2021

kliem added the s: needs review label Apr 29, 2021

mwageringel added s: needs work and removed s: needs review labels Apr 29, 2021

kliem added s: needs review and removed s: needs work labels Apr 29, 2021

This comment has been minimized.

Sign in to view

kliem added s: needs work and removed s: needs review labels May 1, 2021

kliem added s: needs review and removed s: needs work labels May 4, 2021

mwageringel added s: positive review and removed s: needs review labels May 8, 2021

vbraun removed the s: positive review label May 27, 2021

vbraun closed this as completed in 8cab4cd May 27, 2021

dimpase mentioned this issue Oct 19, 2021

implicitly fuzz RNG-dependent doctests with a random random seed #29935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make stats doctests ready for random seeds #29972

Make stats doctests ready for random seeds #29972

kliem commented Jun 24, 2020

kliem commented Jun 24, 2020

mkoeppe commented Mar 15, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

mwageringel commented Apr 29, 2021

sagetrac-git mannequin commented Apr 29, 2021

sagetrac-git mannequin commented Apr 29, 2021

kliem commented Apr 29, 2021

slel commented Apr 29, 2021

This comment has been minimized.

sagetrac-git mannequin commented Apr 30, 2021

sagetrac-git mannequin commented Apr 30, 2021

kliem commented Apr 30, 2021

mwageringel commented Apr 30, 2021

slel commented Apr 30, 2021

kliem commented May 1, 2021

kliem commented May 1, 2021

kliem commented May 1, 2021

kliem commented May 1, 2021

mwageringel commented May 1, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

kliem commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

mwageringel commented May 8, 2021

mwageringel commented May 8, 2021

vbraun commented May 27, 2021

Make stats doctests ready for random seeds #29972

Make stats doctests ready for random seeds #29972

Comments

kliem commented Jun 24, 2020

kliem commented Jun 24, 2020

mkoeppe commented Mar 15, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

kliem commented Apr 29, 2021

mwageringel commented Apr 29, 2021

sagetrac-git mannequin commented Apr 29, 2021

sagetrac-git mannequin commented Apr 29, 2021

kliem commented Apr 29, 2021

slel commented Apr 29, 2021

This comment has been minimized.

sagetrac-git mannequin commented Apr 30, 2021

sagetrac-git mannequin commented Apr 30, 2021

kliem commented Apr 30, 2021

mwageringel commented Apr 30, 2021

slel commented Apr 30, 2021

kliem commented May 1, 2021

kliem commented May 1, 2021

kliem commented May 1, 2021

kliem commented May 1, 2021

mwageringel commented May 1, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

kliem commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

sagetrac-git mannequin commented May 4, 2021

mwageringel commented May 8, 2021

mwageringel commented May 8, 2021

vbraun commented May 27, 2021