-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch with Ziggurat method for Normal distribution (Trac #1450) #2047
Comments
Attachment added by trac user ilan on 2010-04-10: ziggurat.patch |
@charris wrote on 2010-04-10 We've had better ziggurat methods than those in Doornik for some years now, but there has been little push in getting them into numpy.
You should contact Bruce Carneal [email protected] and see if you can get his code. It is in c++ but can probably be translated to c without too much trouble. If we are going the ziggurat direction, it would be nice to implement the exponential distribution also. As you can see, the overall speedup also depends on the underlying rng, so if you can figure out a way of making choices available there that will also help. Of course, for small sets of values call overhead will dominate everything. Thanks for looking into this problem, all it needs is for someone to take it in hand and push it through. |
trac user ilan wrote on 2010-04-10 Thanks for your quick response. Yes, the speedup depends on the underlying RNG, and the patch I propose is just using what is already there. The method is quite simple and already a more than 2 times speedup. My felling is that by using faster underlying RNGs, a lot more could be done, but this is really a separate issue. Writing the Ziggurat method more general, such that exponential (and possibly other) distributions can be done with the same code, might be possible but I don't think its worth the effort, as the code is very small, and also the tail distribution is different in other cases. |
@charris wrote on 2010-04-10 The 'Harris' ziggurat is mine and the code is actually simpler than the Marsaglia version (Doornik). I also did the exponential (as did Marsaglia), which is even easier. Notice that the Marsaglia version actually uses an exponential to bound the tail. So do I, but I make it part of the ziggurat, which simplifies matters. It takes maybe ten lines of python to compute the needed constants for the various ziggurats. Note that in general you can get about 4x speedup vs Box-Muller for the MT rng. |
@charris wrote on 2010-04-10 A few additional comments.
isn't evenly distributed between positive and negative. I don't think that makes any difference in practice given the other roundoff errors, but is worth noting because the steps in the ziggurat are symmetric about zero. Accepting the roundoff errors, it can be fixed by shifting and scaling the ziggurat layer to [0, 1-eps], using the positive double, and shifting and scaling the return value. This also avoids the call to fabs and saves one floating multiplication.
uses a double in the comparison. One of the virtues of the original ziggurat was using integers here, which is most likely faster even on modern hardware.
Chuck |
IIRC, this was in the Enthought edition for a while, but messed up results since variates were not produced in the same order as the current algorithm. Adding something like this either needs a new name, or a method argument for the normal generators. |
Is this patch still active? NumPy's random number generation performance is somewhat behind other choices (Julia or MATLAB, around a factor of 4), and this seems like a simple method to improve the performance for Gaussians which are both directly interesting and important inputs for random numbers from other distributions. |
No, but we could take it up again. The current generator needs to be kept for backward compatibility, so a new one would need a new name. |
It seems overzealous to require backward compatibility for the specific numbers sampled by the random number generators. Everything else equal of course it's better, but if sampling can be sped up 2x or 4x then there is a tradeoff. |
@argriffing it's not unimportant for reproducibility of unit tests using random number generators (same reason that a seed needs to be used in such tests). |
If anyone really wants the speedup, it's pretty easy to satisfy both
|
We do require that the same sequences be generated by the same functions, it is even tested. Reproducibility requires that. But there is no reason not to add new functions, say
So if we really wanted fast, that would be one way to go. Multiple cores may make preserving the sequence difficult. Super high performance can probably be pushed off to a specialized package. |
I also don't like changing random state for some speed up (no matter how awesome). Now personally, I doubt that will happen, but I used to write scripts with a fixed seed for reproducibility. Some of that reproducibility is gone anyway because even tiny float rounding changes will break it, but still. I don't mind having a method keyword. I also don't mind adding logic such as "If the seed was fixed, warn the user of a future change and use the old version, otherwise just use the new version", then keep the old version around and after a while make the faster one default. And maybe right away or a little later remove the warning. |
Bruce is on Linkedin. I never got any of my code back, so it may be worth contacting him first if we decide to pursue this. |
I can see an argument either way for making improvements opt-in vs opt-out.
|
I woudl assume this is SFMT, http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html, which is a SIMD implementation of the standard MT (same state). This is orthogonal to the Ziggurat change which only affects normal generation. There is another generator by the same authors (dSFMT) that specializes in doubles that is potentially even faster for uniforms with double precision - it is used by Julia. It would seem to me the first step would be to get the I am sympathetic to the above points that for most users, they will want the fastest method to generate a random variable from any distribution F, and when this requires a normal (e.g. a Students's t), then it would be preferable to use |
Moreover, if exact reproducibility is really crucial, someone can always setup a virtualenv with |
many people do care about reproducibility, we even get bug reports that our MT implementation gives different results than matlab and R for 64 bit numbers from time to time. fwiw in numpy 1.9 the random module can now be used from multiple threads via the threading module, but you do need a state per thread as each state still has a lock. |
Just for fun it would be great to add this to the rng wishlist http://www.deshawresearch.com/resources_random123.html It can be used to generate many independent streams ( |
There are definitely some differences - MATLAB, for example, produces random integers with a range up to |
the base algorithms are often the same, 32 bit mersenne twister, the differences come how those 32 bit numbers are converted to 64 bit numbers. but actually dropping reproducibility for stuff like normal could be more acceptable as the current box muller method does imply using libc math functions were one has no control over how accurate they are. So we are already only providing approximate reproducibility for these. |
And indeed, that is what we have agreed to. We can add new core PRNGs, but not change the core PRNG of |
My new favorite candidate for a new core PRNG: http://www.pcg-random.org/ |
PCG is definitely sweet. Random123 is probably also worth a look-in: (Cribbed from the thread at JuliaLang/julia#5105 that github helpfully auto-linked above. They also have some other links to recent papers on optimizing RNGs.) |
I have started some initial work trying to produce a usable implementation of pcg at https://github.com/bashtage/pcg-python. Right not it has some basic RNGs implemented, as well as the ability to produce multiple streams and advance (skip to point in rng chain). It a little faster (25-50%) than the NumPy rngs, although I don't do as much (e.g., return only 1-d array). I'm no expert at C/Cython integration, so I would imagine there are some things I could do to eek out some more performance (e.g., doing more directly in C). The numpy mtrand code seems complex to me, and I'd welcome any collaboration. |
@bashtage Excellent! I have a C implementation of pcg64 with a software implementation of uint128 arithmetic for those platforms that need it. I think it's worth playing around with the interface that you have now to work out the API for keyed streams and the like before investing the energy to make the rest of the |
@rkern I've done some initial experiments with the pcg64 and the performance is a little disappointing. For example 1,000,000 uniforms takes about 5.7ms with pcg32 and 5.0 with pcg64. When using the NumPy Guassian generation method this difference is completely lost. So from a performance POV the pcg64 doesn't add much (all tests suing gcc 4.8.1, so YMMV). pcg64 does have a longer period and can produce more streams, so still might be worth it (also probably better to just go to the better generator). Looking through mtrand.pxy, randomkit, distributions, etc, this seems like excessively complex code. I know it was written a long time ago (from what I can tell mostly < 2005) so this isn't a criticism. In an ideal work, the random generation code would be refactored so that it would accept random uint32/uint64s from an arbitrary generator. This would allow further generators to be added with much less pain. This is also unfortunately outside of my competency as there are things in mtrand.pyx that I don't really understand. I also noticed there are a lot of functions in distributions.c that don't really need to be written in C (e.g. a Cauchy rng, which is just he ratio of two normals). From a maintenance POV it might make sense to have a layered approach, from lowest level to highest vendor-supplied rng code (e.g. pcg-*.c) Is your modified pcg64 on github? |
FWIW the fundamental reason I am interested in this is not performance but the need (which I think is pretty essential) for a real multi-stream/advance-able RNG for parallel applications. |
My main motivation for pcg64 is the period, not speed; I'm surprised that it's faster at all. The typical advice is that one should not draw more numbers than The approach I recommended above is along the lines of what you are suggesting and should help avoid the complexity of I still need to add the copyright mumbo-jumbo to my implementation of pcg64, then I'll stick it up on Github. |
This said, the effective period is more like
I am guessing this is where the small speed increases come in. |
I now have a generic structure that allows alternative core RNGs to be used.
These are all combined to produce rng-specific externsion (right now it outputs 4). Comments welcome at this point. Some performance numbers:
Dummy is a trivial rng that repeats the same 20 numbers. Just used to test another hypothetical RNG. |
Sounds like you're making some great progress! instantiates the default core RNG with random seedr = RandomState() instantiates the default core RNG with specified seedr = RandomState(seed=123) creates a RandomState using PCG64 with specified seedr = RandomState(rng=np.random.PCG64(seed=123)) convenience short-hand for the abover = RandomState(seed=123, rng="PCG64") On Tue, Dec 15, 2015 at 4:04 PM, Kevin Sheppard [email protected]
Nathaniel J. Smith -- http://vorpus.org |
My objections to the HAS-A API still hold: #5299 (comment) |
Responding to your comment there:
Yes, this is one of the advantages of the HAS-A api -- each core generator class can get its own
I guess we've converged now on using an instance parameter to change distribution implementations, and exactly the idea here is that having the implementations and the core generator as separate variables makes them orthogonal -- plus it makes it possible to change the default generator using the same logic as we use for changing the default level. (If no generator or seed is specified, we want to default to the best/fastest generator -- but backcompat means that this is only possible to do if we're also able to switch back to MT if the user later calls |
Please don't include the shorthand, then.
There is nothing sacred about the name |
We're not going to break backcompat, and this is hardly complicated...:
The question of whether we should also a string-based shorthand for well-known RNG objects is a little more complicated and I'm not sure whether or not it's a good idea myself; I'm happy to defer that until we get more into the details.
Before, seed set the state of the RNG used for future calls. Now, it would set the state+algorithm used for future calls. What's confusing about that? The The alternative is that we could just continue defaulting to the old-and-slow-and-lower-quality algorithms forever, but that seems like a worse outcome to me... |
Oh, hmm, also: @bashtage: could you open a new issue for the work on replacing the core generator? It's orthogonal to the ziggurat issue which this issue is nominally about... |
Ah, the shorthand that you proposed The interface that I would prefer is one in which I grab the class that I want and use it rather than grabbing two classes and combining them. It's simple and well proven by the standard library.
Compared to:
The standard approach has the benefit of being manifestly backwards compatible. |
I have started an issue to look at multiple PRNGs: #6967 |
@charris Why is backward compatibility an issue for variate generation? The documentation never promised that the same variates would be generated across versions. |
@NeilGirdhar It was to allow repeatablility. Changing that is under discussion. |
@NeilGirdhar It's documented here ("Compatibility Guarantee"): https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html#numpy.random.RandomState |
@rkern Oh, I see, my mistake. |
xref #13163 . This is fixed in the randomgen branch. |
The randomgen branch now supports external BitGenerators, and supports PCG64 natively although there is some discussion about removing some of the BitGenerators. Closing, thanks for providing some of the impetus to make this change. Please reopen if you feel there are pieces of this missing in the new API. |
Original ticket http://projects.scipy.org/numpy/ticket/1450 on 2010-04-10 by trac user ilan, assigned to @charris.
I've written a patch which replaces the Box-Muller transform
in numpy/random/mtrand/randomkit.c with the faster Ziggurat method.
The patch also includes updates to doc-strings which include the
relevant references:
My patch is based on the support code which J. Doornik has made available
on his web page (first reference). I've contacted him, and he has given
permission to include his code in numpy on the condition that his paper
is referenced (which the patch does).
The patch was written against http://svn.scipy.org/svn/numpy/tags/1.4.0,
and I've just checked to make sure it can still be applied to the
trunk (revision 8324). I've tested the patch on Windows, MacOSX, Linux
and Solaris.
The speedup is a bit over 2 times, when creating a large number of Standard
Normal distributed random numbers. To verify that the method works and
produces good random numbers, I have:
uniform distributed random numbers
Feel free to hammer the patch. I hope this patch will make its way
into numpy soon.
The text was updated successfully, but these errors were encountered: