Patch with Ziggurat method for Normal distribution (Trac #1450) #2047

thouis · 2012-10-19T21:02:53Z

Original ticket http://projects.scipy.org/numpy/ticket/1450 on 2010-04-10 by trac user ilan, assigned to @charris.

I've written a patch which replaces the Box-Muller transform
in numpy/random/mtrand/randomkit.c with the faster Ziggurat method.
The patch also includes updates to doc-strings which include the
relevant references:

Doornik, J.A. (2005), "An Improved Ziggurat Method to Generate Normal
Random Samples", mimeo, Nuffield College, University of Oxford, and
www.doornik.com/research.

Marsaglia, G. and Tsang, W. W. (2000), "The Ziggurat Method for
Generating Random Variables", Journal of Statistcal Software 5,
http://www.jstatsoft.org/v05/i08/

Marsaglia, G. (1964) "Generating a variable from the tail of the
normal distribution", Technometrics 6, 101-102.

My patch is based on the support code which J. Doornik has made available
on his web page (first reference). I've contacted him, and he has given
permission to include his code in numpy on the condition that his paper
is referenced (which the patch does).

The patch was written against http://svn.scipy.org/svn/numpy/tags/1.4.0,
and I've just checked to make sure it can still be applied to the
trunk (revision 8324). I've tested the patch on Windows, MacOSX, Linux
and Solaris.

The speedup is a bit over 2 times, when creating a large number of Standard
Normal distributed random numbers. To verify that the method works and
produces good random numbers, I have:

created a large array, using randn()
transformed (using the cumulative distribution function) it back to
uniform distributed random numbers
written the data to a large file
performed the diehard tests (I used the dieharder program) on the data

Feel free to hammer the patch. I hope this patch will make its way
into numpy soon.

The text was updated successfully, but these errors were encountered:

thouis · 2012-10-19T21:02:53Z

Attachment added by trac user ilan on 2010-04-10: ziggurat.patch

thouis · 2012-10-19T21:02:53Z

@charris wrote on 2010-04-10

We've had better ziggurat methods than those in Doornik for some years now, but there has been little push in getting them into numpy.

Hi there.

I've completed timing and robustness tests that should
allow apples to apples comparisons of the mtrand normal distribution
possibilities.  The short story goes like this:

 0) One of the 'Harris' ziggurats, zig7, passes all TestU01 crush tests
   when running atop each of five PRNGS.

 1) Depending on how fast the underlying PRNG is, zig7
   outperforms Box_Muller by ~1.3X to ~8.5X.

 2) using gcc, mwc8222+zig7 outperforms mt19937+Box_Muller by 7X.

See you both at SciPy.

Bruce


And here are the details:

In what follows the uniform variate generators are:
 lcg64
 mwc8222
 mt19937
 mt19937_64
 yarn5

And the normal distribution codes are:
 trng  - default normal distribution code in TRNG
 boxm  - Box-Muller, mtrand lookalike, remembers/uses 2nd value
 zig7  - a 'Harris' ziggurat indexed by 7 bits
 zig8  - a 'Harris' ziggurat indexed by 8 bits
 zig9  - a 'Harris' ziggurat indexed by 9 bits

Here are the numbers in more detail:

# Timings from icc -O2 running on 2.4GhZ Core-2

lcg64 trng: 6.52459e+06 ops per second
lcg64 boxm: 2.18453e+07 ops per second
lcg64 zig7: 1.80616e+08 ops per second
lcg64 zig8: 2.01865e+08 ops per second
lcg64 zig9: 2.05156e+08 ops per second

mwc8222 trng: 6.52459e+06 ops per second
mwc8222 boxm: 2.08787e+07 ops per second
mwc8222 zig7: 9.44663e+07 ops per second
mwc8222 zig8: 1.05326e+08 ops per second
mwc8222 zig9: 1.03478e+08 ops per second

mt19937 trng: 6.41112e+06 ops per second
mt19937 boxm: 1.64986e+07 ops per second
mt19937 zig7: 4.23762e+07 ops per second
mt19937 zig8: 4.52623e+07 ops per second
mt19937 zig9: 4.52623e+07 ops per second

mt19937_64 trng: 6.42509e+06 ops per second
mt19937_64 boxm: 1.93226e+07 ops per second
mt19937_64 zig7: 5.8762e+07 ops per second
mt19937_64 zig8: 6.17213e+07 ops per second
mt19937_64 zig9: 6.29146e+07 ops per second

yarn5 trng: 5.95781e+06 ops per second
yarn5 boxm: 1.19156e+07 ops per second
yarn5 zig7: 1.48945e+07 ops per second
yarn5 zig8: 1.54809e+07 ops per second
yarn5 zig9: 1.53201e+07 ops per second


# Timings from g++ -O2 running on a 2.4GhZ Core-2

lcg64 trng: 6.72163e+06 ops per second
lcg64 boxm: 1.50465e+07 ops per second
lcg64 zig7: 1.31072e+08 ops per second
lcg64 zig8: 1.48383e+08 ops per second
lcg64 zig9: 1.6036e+08 ops per second

mwc8222 trng: 6.64215e+06 ops per second
mwc8222 boxm: 1.44299e+07 ops per second
mwc8222 zig7: 8.903e+07 ops per second
mwc8222 zig8: 1.00825e+08 ops per second
mwc8222 zig9: 1.03478e+08 ops per second

mt19937 trng: 6.52459e+06 ops per second
mt19937 boxm: 1.28223e+07 ops per second
mt19937 zig7: 5.00116e+07 ops per second
mt19937 zig8: 5.41123e+07 ops per second
mt19937 zig9: 5.47083e+07 ops per second

mt19937_64 trng: 6.58285e+06 ops per second
mt19937_64 boxm: 1.42988e+07 ops per second
mt19937_64 zig7: 6.72164e+07 ops per second
mt19937_64 zig8: 7.39591e+07 ops per second
mt19937_64 zig9: 7.46022e+07 ops per second

yarn5 trng: 6.25144e+06 ops per second
yarn5 boxm: 8.93672e+06 ops per second
yarn5 zig7: 1.50465e+07 ops per second
yarn5 zig8: 1.57496e+07 ops per second
yarn5 zig9: 1.56038e+07 ops per second

You should contact Bruce Carneal [email protected] and see if you can get his code. It is in c++ but can probably be translated to c without too much trouble. If we are going the ziggurat direction, it would be nice to implement the exponential distribution also. As you can see, the overall speedup also depends on the underlying rng, so if you can figure out a way of making choices available there that will also help. Of course, for small sets of values call overhead will dominate everything.

Thanks for looking into this problem, all it needs is for someone to take it in hand and push it through.

thouis · 2012-10-19T21:02:53Z

trac user ilan wrote on 2010-04-10

Thanks for your quick response. Yes, the speedup depends on the underlying RNG, and the patch I propose is just using what is already there. The method is quite simple and already a more than 2 times speedup. My felling is that by using faster underlying RNGs, a lot more could be done, but this is really a separate issue.

Writing the Ziggurat method more general, such that exponential (and possibly other) distributions can be done with the same code, might be possible but I don't think its worth the effort, as the code is very small, and also the tail distribution is different in other cases.

thouis · 2012-10-19T21:02:54Z

@charris wrote on 2010-04-10

The 'Harris' ziggurat is mine and the code is actually simpler than the Marsaglia version (Doornik). I also did the exponential (as did Marsaglia), which is even easier. Notice that the Marsaglia version actually uses an exponential to bound the tail. So do I, but I make it part of the ziggurat, which simplifies matters. It takes maybe ten lines of python to compute the needed constants for the various ziggurats. Note that in general you can get about 4x speedup vs Box-Muller for the MT rng.

thouis · 2012-10-19T21:02:54Z

@charris wrote on 2010-04-10

A few additional comments.

The random variable

u = 2.0 * rk_double(state) - 1.0

isn't evenly distributed between positive and negative. I don't think that makes any difference in practice given the other roundoff errors, but is worth noting because the steps in the ziggurat are symmetric about zero. Accepting the roundoff errors, it can be fixed by shifting and scaling the ziggurat layer to [0, 1-eps], using the positive double, and shifting and scaling the return value. This also avoids the call to fabs and saves one floating multiplication.

       /* first try the rectangular boxes */
       if (fabs(u) < s_adZigR[i])
           return u * s_adZigX[i];

uses a double in the comparison. One of the virtues of the original ziggurat was using integers here, which is most likely faster even on modern hardware.

It would be nice to move the initialization to load time so that there is no need to check it for every call. That isn't likely to make much difference given the overhead of the call to the rng but it looks a bit out of place.
The function zig_NormalTail can be removed if an improved ziggurat is used.
You can avoid the occasional extra call to rk_random if you handle the double conversion yourself. There are enough bits left over from the double to deal with the ziggurat layers. Having the pre-conversion integer form will also allow you to use integer comparisons in 2).

Chuck

charris · 2014-02-20T01:28:14Z

IIRC, this was in the Enthought edition for a while, but messed up results since variates were not produced in the same order as the current algorithm. Adding something like this either needs a new name, or a method argument for the normal generators.

bashtage · 2014-07-19T11:15:57Z

@thouis @charris

Is this patch still active? NumPy's random number generation performance is somewhat behind other choices (Julia or MATLAB, around a factor of 4), and this seems like a simple method to improve the performance for Gaussians which are both directly interesting and important inputs for random numbers from other distributions.

charris · 2014-07-19T11:58:59Z

No, but we could take it up again. The current generator needs to be kept for backward compatibility, so a new one would need a new name.

argriffing · 2014-07-19T14:17:41Z

It seems overzealous to require backward compatibility for the specific numbers sampled by the random number generators. Everything else equal of course it's better, but if sampling can be sped up 2x or 4x then there is a tradeoff.

rgommers · 2014-07-19T14:29:44Z

@argriffing it's not unimportant for reproducibility of unit tests using random number generators (same reason that a seed needs to be used in such tests).

njsmith · 2014-07-19T14:34:47Z

If anyone really wants the speedup, it's pretty easy to satisfy both
impulses. Save the old generator as a subclass, and write a function that
takes a numpy version number and fetches a RandomState instance with the
corresponding implementations. Then people who need to reproduce old
results have an easy way to do it, and people who don't care can get the
fastest version.
On 19 Jul 2014 15:29, "Ralf Gommers" [email protected] wrote:

@argriffing https://github.com/argriffing it's not unimportant for
reproducibility of unit tests using random number generators (same reason
that a seed needs to be used in such tests).

—
Reply to this email directly or view it on GitHub
#2047 (comment).

charris · 2014-07-19T14:34:54Z

We do require that the same sequences be generated by the same functions, it is even tested. Reproducibility requires that. But there is no reason not to add new functions, say zignormal. Bruce disappeared after all that work he did to test and verify the algorithms. I still have my code sitting around somewhere, although it is pretty ancient now. IIRC, there was a small bug that Bruce found, overflow or some such, which I think I fixed in my code. Hmm, I've still got all the emails from the summer of 2008. Looks like Bruce also had an SIMD version which sped things up even more.

PS: As before, the scaling, relative to a single serial thread, is 16X
(4 wide SIMD X 4 cores).

So if we really wanted fast, that would be one way to go. Multiple cores may make preserving the sequence difficult. Super high performance can probably be pushed off to a specialized package.

seberg · 2014-07-19T14:48:23Z

I also don't like changing random state for some speed up (no matter how awesome). Now personally, I doubt that will happen, but I used to write scripts with a fixed seed for reproducibility. Some of that reproducibility is gone anyway because even tiny float rounding changes will break it, but still. I don't mind having a method keyword. I also don't mind adding logic such as "If the seed was fixed, warn the user of a future change and use the old version, otherwise just use the new version", then keep the old version around and after a while make the faster one default. And maybe right away or a little later remove the warning.

charris · 2014-07-19T14:49:35Z

Bruce is on Linkedin. I never got any of my code back, so it may be worth contacting him first if we decide to pursue this.

njsmith · 2014-07-19T15:11:45Z

I can see an argument either way for making improvements opt-in vs opt-out.
But I do feel strongly that

algorithm selection should be done on the object level, not the method
level. Adding methods like zignormal would expose implementation clutter to
users who have no reason to care about such details, create awkwardness for
code that wants to use the fastest method while supporting a range of numpy
versions, and wouldn't really work when one method calls another (IIRC
there are methods that require normal variates - do we need zig versions of
these methods too?). Setting it at the object level keeps all this stuff
hidden as implementation details.
we should support old versions indefinitely so long as its practical (and
I can't see why it wouldn't be). Think pickle versions, not deprecation
cycles.
On 19 Jul 2014 15:49, "Charles Harris" [email protected] wrote:

Bruce is on Linkedin. I never got any of my code back, so it may be worth
contacting him first if we decide to pursue this.

—
Reply to this email directly or view it on GitHub
#2047 (comment).

bashtage · 2014-07-19T17:22:06Z

Looks like Bruce also had an SIMD version which sped things up even more.

I woudl assume this is SFMT, http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html, which is a SIMD implementation of the standard MT (same state). This is orthogonal to the Ziggurat change which only affects normal generation. There is another generator by the same authors (dSFMT) that specializes in doubles that is potentially even faster for uniforms with double precision - it is used by Julia.

It would seem to me the first step would be to get the zignormal into Numpy as a standard alone normal generator (ditto zigexp) that is exposed. Then users who are performance sensitive could start making use of these. Once these have been used, it might make sense to make these the default RNGs for normals and exponentials.

I am sympathetic to the above points that for most users, they will want the fastest method to generate a random variable from any distribution F, and when this requires a normal (e.g. a Students's t), then it would be preferable to use zignormal

bashtage · 2014-07-19T17:25:20Z

I also don't like changing random state for some speed up (no matter how awesome). Now personally, I doubt that will happen, but I used to write scripts with a fixed seed for reproducibility. Some of that reproducibility is gone anyway because even tiny float rounding changes will break it, but still. I don't mind having a method keyword. I also don't mind adding logic such as "If the seed was fixed, warn the user of a future change and use the old version, otherwise just use the new version", then keep the old version around and after a while make the faster one default. And maybe right away or a little later remove the warning.

Moreover, if exact reproducibility is really crucial, someone can always setup a virtualenv with python=2.7 numpy=1.8 and then they should get the same thing. Requiring rng reproducibility across major versions is tantamount to agreeing never to change the default core RNG.

juliantaylor · 2014-07-19T17:31:37Z

many people do care about reproducibility, we even get bug reports that our MT implementation gives different results than matlab and R for 64 bit numbers from time to time.
how about a keyword argument to normal?

fwiw in numpy 1.9 the random module can now be used from multiple threads via the threading module, but you do need a state per thread as each state still has a lock.

bashtage · 2014-07-19T17:36:08Z

Just for fun it would be great to add this to the rng wishlist

http://www.deshawresearch.com/resources_random123.html

It can be used to generate many independent streams (2**64) in parallel.

bashtage · 2014-07-19T17:38:10Z

many people do care about reproducibility, we even get bug reports that our MT implementation gives different results than matlab and R for 64 bit numbers from time to time.

There are definitely some differences - MATLAB, for example, produces random integers with a range up to 2**53, which indicates they are using a double uniform generator and then converting to integer.

juliantaylor · 2014-07-19T17:55:03Z

the base algorithms are often the same, 32 bit mersenne twister, the differences come how those 32 bit numbers are converted to 64 bit numbers.

but actually dropping reproducibility for stuff like normal could be more acceptable as the current box muller method does imply using libc math functions were one has no control over how accurate they are. So we are already only providing approximate reproducibility for these.

rkern · 2014-07-19T21:45:13Z

Requiring rng reproducibility across major versions is tantamount to agreeing never to change the default core RNG.

And indeed, that is what we have agreed to. We can add new core PRNGs, but not change the core PRNG of RandomState. I'm not sure I can be made to care too much about the convenience aliases at the numpy.random level.

rkern · 2015-04-28T10:53:02Z

My new favorite candidate for a new core PRNG: http://www.pcg-random.org/

njsmith · 2015-04-30T05:48:10Z

PCG is definitely sweet. Random123 is probably also worth a look-in:
http://www.deshawresearch.com/resources_random123.html

(Cribbed from the thread at JuliaLang/julia#5105 that github helpfully auto-linked above. They also have some other links to recent papers on optimizing RNGs.)

bashtage · 2015-12-07T23:37:29Z

I have started some initial work trying to produce a usable implementation of pcg at https://github.com/bashtage/pcg-python.

Right not it has some basic RNGs implemented, as well as the ability to produce multiple streams and advance (skip to point in rng chain). It a little faster (25-50%) than the NumPy rngs, although I don't do as much (e.g., return only 1-d array). I'm no expert at C/Cython integration, so I would imagine there are some things I could do to eek out some more performance (e.g., doing more directly in C).

The numpy mtrand code seems complex to me, and I'd welcome any collaboration.

rkern · 2015-12-08T09:48:16Z

@bashtage Excellent! I have a C implementation of pcg64 with a software implementation of uint128 arithmetic for those platforms that need it.

I think it's worth playing around with the interface that you have now to work out the API for keyed streams and the like before investing the energy to make the rest of the RandomState infrastructure deal with it; that's gonna take a lot of work, unfortunately. One possible path forward is to copy over the randomkit and distributions C files over to your project and work on making that part of it switch between the Mersenne Twister and the PCG variants. We can probably iterate on that part faster in your project than trying to do the integration in numpy which has more moving parts. But once we get that solved, it will be easier to migrate it over into numpy where we can focus on the rest of the moving parts (i.e. mtrand.pyx itself).

bashtage · 2015-12-08T14:02:19Z

@rkern I've done some initial experiments with the pcg64 and the performance is a little disappointing. For example 1,000,000 uniforms takes about 5.7ms with pcg32 and 5.0 with pcg64. When using the NumPy Guassian generation method this difference is completely lost. So from a performance POV the pcg64 doesn't add much (all tests suing gcc 4.8.1, so YMMV). pcg64 does have a longer period and can produce more streams, so still might be worth it (also probably better to just go to the better generator).

Looking through mtrand.pxy, randomkit, distributions, etc, this seems like excessively complex code. I know it was written a long time ago (from what I can tell mostly < 2005) so this isn't a criticism. In an ideal work, the random generation code would be refactored so that it would accept random uint32/uint64s from an arbitrary generator. This would allow further generators to be added with much less pain. This is also unfortunately outside of my competency as there are things in mtrand.pyx that I don't really understand.

I also noticed there are a lot of functions in distributions.c that don't really need to be written in C (e.g. a Cauchy rng, which is just he ratio of two normals). From a maintenance POV it might make sense to have a layered approach, from lowest level to highest

vendor-supplied rng code (e.g. pcg-*.c)
shim to provide a consistent interface
core random number generators (i.e., rngs with loops)
cython file with class and non-core rngs

Is your modified pcg64 on github?

bashtage · 2015-12-08T14:03:54Z

FWIW the fundamental reason I am interested in this is not performance but the need (which I think is pretty essential) for a real multi-stream/advance-able RNG for parallel applications.

rkern · 2015-12-08T17:13:18Z

My main motivation for pcg64 is the period, not speed; I'm surprised that it's faster at all. The typical advice is that one should not draw more numbers than sqrt(period). So for pcg32, sqrt(2**64) ~ 5 billion, which is plausible for the more demanding use cases, which we have to design for when we choose a general purpose default. Also, it's nice to only do one draw to get a double.

The approach I recommended above is along the lines of what you are suggesting and should help avoid the complexity of mtrand.pyx per se until we have to. Modify rk_state to be that shim of a consistent C-level interface. Make it work both with a pcg* and with the Mersenne Twister as the underlying core PRNG. Make the C distribution functions work with that generic rk_state interface. Ignore mtrand.pyx for now.

I still need to add the copyright mumbo-jumbo to my implementation of pcg64, then I'll stick it up on Github.

bashtage · 2015-12-08T18:12:34Z

My main motivation for pcg64 is the period, not speed; I'm surprised that it's faster at all. The typical advice is that one should not draw more numbers than sqrt(period). So for pcg32, sqrt(2**64) ~ 5 billion, which is plausible for the more demanding use cases, which we have to design for when we choose a general purpose default.

This said, the effective period is more like 2**127 since you can have 2**63 (independent-ish) streams. Of course, this requires some understanding of the RNG to avoid making mistakes.

Also, it's nice to only do one draw to get a double.

I am guessing this is where the small speed increases come in.

bashtage · 2015-12-15T23:45:46Z

I now have a generic structure that allows alternative core RNGs to be used.
Basically uses the structure:

core rng files
shim that provides a small number of functions, but esp random uint32/64 and state structure, as well as a pxi file.
generic distributions file
generic pyx file + includes and conditional compilation

These are all combined to produce rng-specific externsion (right now it outputs 4).

Comments welcome at this point.

Some performance numbers:

Time to produce 1,000,000 normals
****************************************
dummy_inv_standard_normal           51.90 ms
dummy_zig_standard_normal           15.44 ms
numpy.random_inv_standard_normal    68.71 ms
pcg32_inv_standard_normal           61.30 ms
pcg32_zig_standard_normal           26.00 ms
pcg64_inv_standard_normal           59.90 ms
pcg64_zig_standard_normal           23.01 ms
randomkit_inv_standard_normal       66.53 ms
randomkit_zig_standard_normal       31.66 ms
dtype: object

Normals per second
****************************************
dummy_inv_standard_normal           19.27 million
dummy_zig_standard_normal           64.75 million
numpy.random_inv_standard_normal    14.55 million
pcg32_inv_standard_normal           16.31 million
pcg32_zig_standard_normal           38.46 million
pcg64_inv_standard_normal           16.69 million
pcg64_zig_standard_normal           43.47 million
randomkit_inv_standard_normal       15.03 million
randomkit_zig_standard_normal       31.58 million
dtype: object

Speed-up relative to NumPy
****************************************
dummy_inv_standard_normal         32.4%
dummy_zig_standard_normal        344.9%
pcg32_inv_standard_normal         12.1%
pcg32_zig_standard_normal        164.2%
pcg64_inv_standard_normal         14.7%
pcg64_zig_standard_normal        198.7%
randomkit_inv_standard_normal      3.3%
randomkit_zig_standard_normal    117.0%

Dummy is a trivial rng that repeats the same 20 numbers. Just used to test another hypothetical RNG.

njsmith · 2015-12-16T00:14:53Z

Sounds like you're making some great progress!
.
I think that eventually, the API we're going to want to expose for
alternative core RNGs will be something like:
For each core RNG, there is a single Python class -- e.g.
np.random.MersenneTwister, np.random.PCG64, etc. Instances of these classes
hold the RNG state, and expose some basic methods for initialization,
generation and getting/setting state (probably only at the C/Cython level
-- we'll want this to keep most of this API private for now). The
RandomState object then is changed so that it HAS-A core RNG object that it
stores as an instance variable, and we replace all the current calls to
global rng functions with calls to methods on this instance variable.
Common end-user usage would look like:

instantiates the default core RNG with random seed

r = RandomState()

instantiates the default core RNG with specified seed

r = RandomState(seed=123)

creates a RandomState using PCG64 with specified seed

r = RandomState(rng=np.random.PCG64(seed=123))

convenience short-hand for the above

r = RandomState(seed=123, rng="PCG64")

On Tue, Dec 15, 2015 at 4:04 PM, Kevin Sheppard [email protected]
wrote:

I now have a generic structure that allows alternative core RNGs to be
used.
Basically uses the structure:

core rng files

shim that provides a small number of functions, but esp random
uint32/64 and state structure, as well as a pxi file.

generic distributions file

generic pyx file + includes and conditional compilation

These are all combined to produce rng-specific externsion (right now it
outputs 4).

Comments welcome at this point.

Some performance numbers:

Time to produce 1,000,000 normals

dummy_inv_standard_normal 51.90 ms
dummy_zig_standard_normal 15.44 ms
numpy.random_inv_standard_normal 68.71 ms
pcg32_inv_standard_normal 61.30 ms
pcg32_zig_standard_normal 26.00 ms
pcg64_inv_standard_normal 59.90 ms
pcg64_zig_standard_normal 23.01 ms
randomkit_inv_standard_normal 66.53 ms
randomkit_zig_standard_normal 31.66 ms
dtype: object

Normals per second

dummy_inv_standard_normal 19.27 million
dummy_zig_standard_normal 64.75 million
numpy.random_inv_standard_normal 14.55 million
pcg32_inv_standard_normal 16.31 million
pcg32_zig_standard_normal 38.46 million
pcg64_inv_standard_normal 16.69 million
pcg64_zig_standard_normal 43.47 million
randomkit_inv_standard_normal 15.03 million
randomkit_zig_standard_normal 31.58 million
dtype: object

Speed-up relative to NumPy

dummy_inv_standard_normal 32.4%
dummy_zig_standard_normal 344.9%
pcg32_inv_standard_normal 12.1%
pcg32_zig_standard_normal 164.2%
pcg64_inv_standard_normal 14.7%
pcg64_zig_standard_normal 198.7%
randomkit_inv_standard_normal 3.3%
randomkit_zig_standard_normal 117.0%

Dummy is a trivial rng that repeats the same 20 numbers. Just used to test
another hypothetical RNG.

—
Reply to this email directly or view it on GitHub
#2047 (comment).

Nathaniel J. Smith -- http://vorpus.org

rkern · 2015-12-16T08:27:58Z

My objections to the HAS-A API still hold: #5299 (comment)

njsmith · 2015-12-16T08:40:28Z

Responding to your comment there:

The core generators that I am most interested in adding are those that can be parameterized, but each one in different ways. Making the argument signature dependent on the value of one of the arguments is an antipattern I would like to avoid.

Yes, this is one of the advantages of the HAS-A api -- each core generator class can get its own __init__ signature. In my proposal we still accept a seed argument in RandomState.__init__ for backcompat, but this would just be a shorthand for the real api where users can instantiate the core generator object however they want.

I am inclined against having the distribution implementations change by a configuration parameter, but I could be persuaded. Not least, having this mechanism be orthogonal to the selection of the core generator would be ideal.

I guess we've converged now on using an instance parameter to change distribution implementations, and exactly the idea here is that having the implementations and the core generator as separate variables makes them orthogonal -- plus it makes it possible to change the default generator using the same logic as we use for changing the default level. (If no generator or seed is specified, we want to default to the best/fastest generator -- but backcompat means that this is only possible to do if we're also able to switch back to MT if the user later calls seed.)

rkern · 2015-12-16T11:34:21Z

In my proposal we still accept a seed argument in RandomState.__init__ for backcompat, but this would just be a shorthand for the real api where users can instantiate the core generator object however they want.

Please don't include the shorthand, then.

(If no generator or seed is specified, we want to default to the best/fastest generator -- but backcompat means that this is only possible to do if we're also able to switch back to MT if the user later calls seed.)

seed() silently switching algorithms is a really confusing behavior! Please don't do it. I would suggest simply leaving RandomState alone at the Python API level (under the Cython hood, you can do what you like). I would object less to a HAS-A API under a new name that does not have backwards-compatibility requirements.

There is nothing sacred about the name RandomState. It's not even a very good name for what it does.

njsmith · 2015-12-16T23:32:13Z

Please don't include the shorthand, then.

We're not going to break backcompat, and this is hardly complicated...:

class RandomState:
    def __init__(self, seed=None, rng=None):
        if rng is None:
             rng = DefaultRNG(seed)
       elif seed is not None:
             raise ValueError("Can't specify seed if rng object is given")
       ...

The question of whether we should also a string-based shorthand for well-known RNG objects is a little more complicated and I'm not sure whether or not it's a good idea myself; I'm happy to defer that until we get more into the details.

seed() silently switching algorithms is a really confusing behavior! Please don't do it.

Before, seed set the state of the RNG used for future calls. Now, it would set the state+algorithm used for future calls. What's confusing about that? The version code already has seed potentially changing the algorithm used for complex distributions, which is substantially harder to explain (but it doesn't much matter, because users don't have to care so long as streams are reproducible).

The alternative is that we could just continue defaulting to the old-and-slow-and-lower-quality algorithms forever, but that seems like a worse outcome to me...

njsmith · 2015-12-16T23:32:43Z

Oh, hmm, also: @bashtage: could you open a new issue for the work on replacing the core generator? It's orthogonal to the ziggurat issue which this issue is nominally about...

rkern · 2015-12-17T09:52:36Z

Ah, the shorthand that you proposed r = RandomState(seed=123, rng="PCG64") does not follow the semantics that you then sketched out. It's this that I object to, because the initialization of the new core PRNGs of interest take various forms, not just a single seed number/array. For example, the most desired new feature of the PCG family is that it has selectable streams.

The interface that I would prefer is one in which I grab the class that I want and use it rather than grabbing two classes and combining them. It's simple and well proven by the standard library.

from numpy.random import PCG64

prng = PCG64(seed=1234, stream=5678)
x = prng.weibull(...)
prng.jumpahead(10000)
y = prng.normal(...)
# Restart.
prng.seed(1234)
assert_array_equal(x, prng.weibull(...))

Compared to:

from numpy.random import RandomState, PCG64

prng = RandomState(rng=PCG64(seed=1234, stream=5678))
x = prng.weibull(...)
prng.rng.jumpahead(10000)
y = prng.normal(...)
# Restart.
prng.seed(1234) # Wait, we're now back to using Mersenne Twister?!
assert_array_equal(x, prng.weibull(...))  # AssertionError

The standard approach has the benefit of being manifestly backwards compatible. RandomState will always be the Mersenne Twister. We would refactor its guts to use a HAS-A implementation at the C/Cython level, but that doesn't need to show up at the Python API level to make us go into contortions to preserve backwards compatibility.

bashtage · 2016-01-12T14:55:36Z

I have started an issue to look at multiple PRNGs: #6967

NeilGirdhar · 2018-03-06T22:47:55Z

@charris Why is backward compatibility an issue for variate generation? The documentation never promised that the same variates would be generated across versions.

charris · 2018-03-06T23:11:02Z

@NeilGirdhar It was to allow repeatablility. Changing that is under discussion.

rkern · 2018-03-07T00:46:02Z

@NeilGirdhar It's documented here ("Compatibility Guarantee"): https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html#numpy.random.RandomState

NeilGirdhar · 2018-03-07T16:34:36Z

@rkern Oh, I see, my mistake.

bashtage · 2019-04-11T22:47:46Z

xref #13163 . This is fixed in the randomgen branch.

mattip · 2019-05-29T10:17:24Z

The randomgen branch now supports external BitGenerators, and supports PCG64 natively although there is some discussion about removing some of the BitGenerators. Closing, thanks for providing some of the impetus to make this change.

Please reopen if you feel there are pieces of this missing in the new API.

charris added the Patch label Feb 20, 2014

njsmith mentioned this issue Nov 20, 2014

using random.choice to sample integers in a large range #5299

Closed

ViralBShah mentioned this issue Nov 23, 2014

attempt faster randn implementation JuliaLang/julia#5105

Closed

pv mentioned this issue Mar 24, 2015

Random normal generator compatible with MATLAB randn scipy/scipy#4666

Closed

rkern mentioned this issue Apr 28, 2015

Generator for normally distributed random numbers that reproduces MATLAB #5810

Closed

argriffing mentioned this issue May 8, 2015

Bug in np.random.dirichlet for small alpha parameters #5851

Closed

ev-br mentioned this issue Jun 23, 2015

Alternative RandomState-type objects scipy/scipy#4989

Closed

Eight1911 mentioned this issue Jan 19, 2018

ENH: optimizing numpy.random.choice() for user input probabilities. #10432

Open

mattip removed the priority: normal label Oct 21, 2018

bashtage mentioned this issue Apr 14, 2019

ENH: tracking issue for merging randomgen into numpy #13164

Closed

16 tasks

mattip closed this as completed May 29, 2019

David-Duffy mentioned this issue Dec 21, 2020

Proposal for statistical distributions fortran-lang/stdlib#234

Open

Patch with Ziggurat method for Normal distribution (Trac #1450) #2047

Patch with Ziggurat method for Normal distribution (Trac #1450) #2047

Comments

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

charris commented Feb 20, 2014

bashtage commented Jul 19, 2014

charris commented Jul 19, 2014

argriffing commented Jul 19, 2014

rgommers commented Jul 19, 2014

njsmith commented Jul 19, 2014

charris commented Jul 19, 2014

seberg commented Jul 19, 2014

charris commented Jul 19, 2014

njsmith commented Jul 19, 2014

bashtage commented Jul 19, 2014

bashtage commented Jul 19, 2014

juliantaylor commented Jul 19, 2014

bashtage commented Jul 19, 2014

bashtage commented Jul 19, 2014

juliantaylor commented Jul 19, 2014

rkern commented Jul 19, 2014

rkern commented Apr 28, 2015

njsmith commented Apr 30, 2015

bashtage commented Dec 7, 2015

rkern commented Dec 8, 2015

bashtage commented Dec 8, 2015

bashtage commented Dec 8, 2015

rkern commented Dec 8, 2015

bashtage commented Dec 8, 2015

bashtage commented Dec 15, 2015

njsmith commented Dec 16, 2015

instantiates the default core RNG with random seed

instantiates the default core RNG with specified seed

creates a RandomState using PCG64 with specified seed

convenience short-hand for the above

rkern commented Dec 16, 2015

njsmith commented Dec 16, 2015

rkern commented Dec 16, 2015

njsmith commented Dec 16, 2015

njsmith commented Dec 16, 2015

rkern commented Dec 17, 2015

bashtage commented Jan 12, 2016

NeilGirdhar commented Mar 6, 2018

charris commented Mar 6, 2018

rkern commented Mar 7, 2018

NeilGirdhar commented Mar 7, 2018

bashtage commented Apr 11, 2019

mattip commented May 29, 2019