Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make rand unbiased for floats #33222

Closed
tpapp opened this issue Sep 11, 2019 · 56 comments
Closed

make rand unbiased for floats #33222

tpapp opened this issue Sep 11, 2019 · 56 comments
Labels
randomness Random number generation and the Random stdlib

Comments

@tpapp
Copy link
Contributor

tpapp commented Sep 11, 2019

The current implementation of samplers for eg Float23 and Float64 effectively generates numbers as n*eps(T), where n is an integer between 0 and 1/eps(T)-1 (inclusive). Eg for Float64 this range is 0:(2^52-1).

This makes the random draws (slightly) biased, by -eps(T)/2.

The proposal is to add this term back to the result in the implementation.

Cf this discussion, especially the proposal of @perrutquist

@mschauer
Copy link
Contributor

Just as a data point, to observe this bias, you would need ballpark

julia> Ts = [Float16, Float32, Float64]
julia> [Ts [(std(Uniform())/(eps(T)/2))^2 for T in Ts]]
3×2 Array{Any,2}:
 Float16  3.49525e5 
 Float32  2.34562e13
 Float64  6.7608e30 

number of trials.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 11, 2019

Possibly, but since we can do it right with a very simple correction, why shouldn't we?

@ViralBShah ViralBShah added the randomness Random number generation and the Random stdlib label Sep 11, 2019
@ViralBShah
Copy link
Member

cc @rfourquet

@mschauer
Copy link
Contributor

mschauer commented Sep 12, 2019

Some more checks (right column after adding eps(T)/2):

julia> T = Float16;[["P(rand() <= 2^(-$i)" for i in 0:16] [mean([(n)*eps(T) <= 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] [mean([(n)*eps(T) + eps(T)/2  <= 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] ]
17×3 Array{Any,2}:
 "P(rand() <= 2^(-0)"   1.0          1.0        
 "P(rand() <= 2^(-1)"   0.500977     0.5        
 "P(rand() <= 2^(-2)"   0.250977     0.25       
 "P(rand() <= 2^(-3)"   0.125977     0.125      
 "P(rand() <= 2^(-4)"   0.0634766    0.0625     
 "P(rand() <= 2^(-5)"   0.0322266    0.03125    
 "P(rand() <= 2^(-6)"   0.0166016    0.015625   
 "P(rand() <= 2^(-7)"   0.00878906   0.0078125  
 "P(rand() <= 2^(-8)"   0.00488281   0.00390625 
 "P(rand() <= 2^(-9)"   0.00292969   0.00195313 
 "P(rand() <= 2^(-10)"  0.00195313   0.000976563
 "P(rand() <= 2^(-11)"  0.000976563  0.000976563
 "P(rand() <= 2^(-12)"  0.000976563  0.0        
 "P(rand() <= 2^(-13)"  0.000976563  0.0  


julia> T = Float16;[["P(rand() < 2^(-$i)" for i in 0:16] [mean([(n)*eps(T) < 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] [mean([(n)*eps(T) + eps(T)/2  < 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] ]
17×3 Array{Any,2}:
 "P(rand() < 2^(-0)"   1.0          1.0        
 "P(rand() < 2^(-1)"   0.5          0.5        
 "P(rand() < 2^(-2)"   0.25         0.25       
 "P(rand() < 2^(-3)"   0.125        0.125      
 "P(rand() < 2^(-4)"   0.0625       0.0625     
 "P(rand() < 2^(-5)"   0.03125      0.03125    
 "P(rand() < 2^(-6)"   0.015625     0.015625   
 "P(rand() < 2^(-7)"   0.0078125    0.0078125  
 "P(rand() < 2^(-8)"   0.00390625   0.00390625 
 "P(rand() < 2^(-9)"   0.00195313   0.00195313 
 "P(rand() < 2^(-10)"  0.000976563  0.000976563
 "P(rand() < 2^(-11)"  0.000976563  0.0        
 "P(rand() < 2^(-12)"  0.000976563  0.0  

This doesn't look too bad at first sight.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 12, 2019

@mschauer: I am sorry, I don't understand the purpose of your execise.

The bias exists and can be described exactly. There is no need to simulate anything.

The bias is correctable, rather cheaply. Whether it "looks bad" in some statistical tests that may or may not have power for detecting it (it is a rather small bias, after all) is not the issue here.

The key question is whether there are any objections to correcting this bias on the table. Otherwise, I would just make a PR. @rfourquet, I would appreciate your opinion on this.

@mschauer
Copy link
Contributor

mschauer commented Sep 12, 2019

I am sorry, I don't understand the purpose of your exercise.

A good implementation of rand() would for example have the property that the probabilities of rand() <= p and rand() < p should be equal to p. I checked*) this for our Random.rand(T) and for your Random.rand(T) + eps(T)/2 and concluded "it doesn't look to bad" for your idea because e.g. P(rand() <= 0.5) = 0.5 and not 0.500977. I am not sure why you are a bit defensive here. Why not discuss the consequences of this change?

*) Using the information "rand() effectively generates numbers as n*eps(T), where n is an integer between 0 and 1/eps(T)-1 (inclusive)" you gave above.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 12, 2019

Sorry if I came across as defensive, that was not the intention.

The consequences of this change are that rand will be unbiased for floats, including the bias you demonstrated, at no measurable performance loss (seems to be negligible compared to noise):

julia> using Random, BenchmarkTools

julia> rand_corrected(T) = rand(T) + eps(T)/2
rand_corrected (generic function with 1 method)

julia> Random.seed!(1);

julia> @btime foreach(_ -> rand(Float64), 1:1024)
  4.382 μs (0 allocations: 0 bytes)

julia> Random.seed!(1);

julia> @btime foreach(_ -> rand_corrected(Float64), 1:1024)
  4.368 μs (0 allocations: 0 bytes)

@perrutquist
Copy link
Contributor

The performance loss wouldn't just be negligible, it would be exactly zero. The current implementation is something like rand(r, CloseOpen12()) - 1.0 and this could be replaced by rand(r, CloseOpen12()) - prevfloat(1.0)

I.e. the only change is in a compile-time constant.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 12, 2019

My understanding is that this rearrangement is valid because eps is 2^-52 in the [1, 2) range and 1.0-prevfloat(1.0) == 2^-53, but I don't know enough about floating point error to prove it formally. But I can of course make the PR do this.

@mschauer
Copy link
Contributor

I can verify the bias.

@mschauer
Copy link
Contributor

mschauer commented Sep 12, 2019

First the Float16 case: With

rnd(::Type{Float16}, i) = Float16(reinterpret(Float32,
                              (UInt32(i) << 13)  | 0x3f800000) - 1)
rndnew(::Type{Float16}, i) = Float16(reinterpret(Float32,
                              (UInt32(i) << 13)  | 0x3f800000) - 0.9995117f0)

being a deterministic model for rand(Float16) generation where i ranges in 0:0x000003ff, I get

julia> mean(rnd(Float16, i) for i in 0:0x000003ff)
Float16(0.4995)

julia> extrema(rnd.(Float16,  0:0x000003ff))
(Float16(0.0), Float16(0.999))

julia> mean(rndnew(Float16, i) for i in 0:0x000003ff)
Float16(0.5)

julia> extrema(rndnew.(Float16,  0:0x000003ff))
(Float16(0.0004883), Float16(0.9995))

@perrutquist
Copy link
Contributor

Although floating-point addition/subtraction is not associative in general, it's quite easy to show that (r - 1.0) + eps()/2 == r - (1.0 - eps()/2) in this case (on systems that adhere to the IEEE standard). It follows directly from these two facts:

  1. Addition and subtraction produce correctly rounded results.
  2. The intermediate terms (r - 1.0), eps()/2 and (1.0 - eps()/2) can all be expressed exactly, i.e. are not affected by rounding.

@mschauer
Copy link
Contributor

We do not need to add eps(Float16)/2, we need to subtract the right number from r where r in [1,2) such that the result is unbiased in [0,1], which is 0.9995117f0 in my code above. This coincides with julia> Float32((1.0 - eps(Float16)/2))
0.9995117f0, but we do not really need to worry about it.

@andreasnoack
Copy link
Member

The problem discussed here appears to be of very small significance in practice. See #8958 (comment) and #8958 (comment)

@tpapp
Copy link
Contributor Author

tpapp commented Sep 13, 2019

Can you please elaborate on this? Which of the Test01 test do you think would pick this up?

My understanding is that to detect an eps()/2 difference empirically would take a lot of samples, too many for practical purposes, and very precise mean calculations at the same time. I don't think that any "random" test would pick this up practically, but that does not mean it is not worth fixing.

@mschauer
Copy link
Contributor

Not for Float64, but for Float16 this can be observed in a fraction of a second


julia> mean((Float64(rand(Float16)) for i in 1:10^7))
0.4994191243164062

julia> mean((Float64(rand(Float16) + eps(Float16)/2) for i in 1:10^7))
0.5000916360351563

@andreasnoack
Copy link
Member

Which of the Test01 test do you think would pick this up?

If an issue with a pseudo RNG isn't picked up by the best random number test suite around then I don't think there is an issue. To my understanding, it's the only meaningful way of evaluating pseudo RNGs for statistical applications.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 13, 2019

I think that this is a pretty narrow view: in this case the bias is too small to be picked up using a black-box test, but can be demonstrated analytically, and fixed very easily.

Test01 is a very nice suite, but I am not sure that even its authors would claim that what it was designed to pick up everything, and conversely what it doesn't show is by construction not important.

@mschauer
Copy link
Contributor

Thinking about it, having P(rand() == 0) too large (see my table) can break Metropolis-Hastings:

using Distributions
using Makie
using Colors
using LinearAlgebra

logπ(x) = logpdf(Normal(), x)
Point2

function mh!(corr, xs, N = 10_000_000, samples = 1:100:N, σ = 2.)
    x = xs[end]
    xmax = maximum(x)
    xmin = minimum(x)
    acc = 0
    for i in 1:N
        xᵒ = x
        while true
            xᵒ = x + σ*randn(Point2)
            norm(xᵒ) > 3 && break
        end
        if rand(Float16) + corr <= exp(logπ(xᵒ[1]) - logπ(x[1]) + logπ(xᵒ[2]) - logπ(x[2]))
            x = xᵒ
            acc += 1
            xmax = max(x[1], x[2], xmax)
            xmin = min(x[1], x[2], xmin)
        end
        if i in samples
            push!(xs, x)
        end
    end
    @show (xmin, xmax)
    @show acc/N
end
xs = [Point2(3.1, 3.1)]
ys = [Point2(3.1, 3.1)]
mh!(0.0, xs)
mh!(eps(Float16)/2, ys) # with correction

N = length(ys)
ii = randperm(2N)
scatter([xs;ys][ii], markersize=0.2, color=[fill(:blue, N);fill(:red, N)][ii])

Output

(xmin, xmax) = (-11.804179996165205, 10.500512563920687)
acc / N = 0.1778087
(xmin, xmax) = (-5.7318743113302615, 5.766862517806166)
acc / N = 0.176106

Screenshot 2019-09-13 at 12 42 48

@mschauer
Copy link
Contributor

Short explanation: A bivariate Normal(0,1) conditioned on having distance larger than 3 to the origin, sampled with symmetric random walk Metropolis-Hastings. Should "never" values as large as 11, because

julia> cdf(Normal(), -11)
1.9106595744986566e-28

@tpapp
Copy link
Contributor Author

tpapp commented Sep 13, 2019

That's a nice demo for Float16. In practice I think one would use Float64 though, where the issue may not be this prominent. But it should be fixed; I will make a PR.

@mschauer
Copy link
Contributor

Or -randexp(x) < ll instead of x < exp(ll). Anyway: yes, this is just meant as illustration.

tpapp added a commit to tpapp/julia that referenced this issue Sep 13, 2019
Fixes JuliaLang#33222.

Also changed some hardcoded test values accordingly.
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Sep 13, 2019

If an issue with a pseudo RNG isn't picked up by the best random number test suite around then I don't think there is an issue.

There's nothing special about the tests in Test01. Testing that the mean of values sampled from [0,1] is 1/2 is perhaps such a simple test that they didn't think to include it, but clearly it is a valid statistical test and one that we would currently fail given enough samples. So why wouldn't we want to fix it? Or put another way, if the current "best random number test suite" in existence doesn't include a bias test, then we could add a bias test (which we'd currently be failing) and then we would fail the best random number test suite (i.e. the new one that we just added the bias test to).

@chriselrod
Copy link
Contributor

Would we fail that test suite because of the bias test? It would take a huge number of draws for Float64. It may be that another test already on it would fail us with fewer draws.

However, I am a fan of this issue and associated PR.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 14, 2019

No, detecting such a small bias using black-box statistical techniques is prohibitively expensive. From the CLT,

$$S√n ∼ N(1/2, 1/12)$$

where S is the running mean and 1/12 is the variance of the uniform on [0,1]. The 99.5% quantile for N(0, 1/12) (assuming we are aiming for a two-tailed test at a 1% significance level) is around 0.75. So you would need around

julia> abs2(2*0.75/eps())
4.563542160821626e31

draws to detect this reliably.

Even detecting something much more crude is expensive. Eg a bias of 1000*eps(), would require around 4e25 draws. For 1e9 Float64s/second (*/ factors of 10, depending on the hardware) which is a reasonable ballpark, this would still take 5e9 years. (Hope I calculated these things right, corrections welcome).

@mschauer
Copy link
Contributor

Thats correct, I got numbers of the same order (see my first post, using a slightly simpler approach of just checking after how many samples the bias exceeds the standard error of the mean.)

@JeffBezanson
Copy link
Member

Check again after #40546?

@JeffBezanson
Copy link
Member

Ah, I guess the correction is eps/4 now instead of eps/2 due to the extra bit?

@tpapp
Copy link
Contributor Author

tpapp commented Jun 11, 2021

Yes, exactly.

And it would make it (0, 1). What I am unsure about is whether we should have traits for RNGs that describe whether they include endpoints of the interval — in practice it can be useful, but perhaps it is overcomplicating this. @rfourquet?

@perrutquist
Copy link
Contributor

Adding eps/4 to a number in the range [0.5, 1) is problematic. It will either increment by 0 or eps/2, since the exact result cannot be represented. (A "round to nearest or even" policy will effectively remove one bit of entropy and make the output range include 1.)

@perrutquist
Copy link
Contributor

The formula

Float64(rand(r, UInt64)) * 0x1.0p-64

would have a bias of eps/8192, and also yield more entropy in the small random numbers. The only downside that I can think of is that it would return 1 with a probability of eps/4, which is mathematically correct, but may break some peoples assumptions.

@JeffBezanson
Copy link
Member

I get

julia> Float64(~UInt(0) >>> 11) * 0x1.0p-53 + eps()/4
1.0

@perrutquist
Copy link
Contributor

Yes. (That's what I meant when I said that adding eps/4 would make the output range include 1.)

If we let x = Float64(~UInt(0) >>> 11) * 0x1.0p-53. Then all of these evaluate as true.

x == prevfloat(1.0)
x == 1 - eps()/2 # exactly
x + eps()/4 == 1 # rounding up
prevfloat(x) + eps()/4 == prevfloat(x) # rounding down

If rand() should never return 1.0, then adding a correction term of prevfloat(eps()/4) is probably the best one can do. This leaves a bias of eps/8. If, on the other hand, returning 1.0 is acceptable, then the bias will be a thousand times smaller if the random number is computed as Float64(rand(r, UInt64)) * 0x1.0p-64 (which might also save a cpu cycle).

@JeffBezanson
Copy link
Member

Personally it seems cool to me if 0 and 1 are possible values, but that's really just subjective and in practice I don't think it matters. So I like the idea of using Float64(rand(r, UInt64)) * 0x1.0p-64, but I wonder if @vigna can comment on that formula (it's not discussed on https://prng.di.unimi.it/).

@vigna
Copy link

vigna commented Jun 11, 2021

The current implementation, from what I see, provides only 52 significant bits. In my experience, the shift-by-11-and-multiply method is practically equivalent in speed on modern CPUs and gives 53 significant bits, so I prefer it. But you can try to benchmark it in Julia and see what happens.

The "real" method to generate floats is linked at https://prng.di.unimi.it/random_real.c (I didn't invent it) and it consists in finding a float with all available precision (something like 1000 bits) and then approximate to the closest representable float. In practice, you use one or two 64-bit values.

Note that you do want the [0..1) interval. Or inversion won't work for discrete distributions. E.g., if you're generating the uniform distribution on the first n integers multiplying a uniform value in [0..1) by n and taking the floor (inversion) this will work provided that 1 is never generated. This is just one example, but there are a number of reasons why [0..1) is the natural interval for a uniform float in [0..1).

The formula that uses all 64 bits introduces some bias in the lower bits due to rounding (I read this in the Java documentation a loooong time ago). This can be bad or not, depending on what you intend for "uniform in [0..1)", given that the actual distribution is continuous and whatever you do on a computer is, in the end, discrete.

@JeffBezanson
Copy link
Member

Interesting, thanks! We still use the 52-bit version for our older RNGs so that they give the same values as before, but for the new xoshiro generator we're using shift-by-11-and-multiply currently.

So it sounds like your recommendation is just to keep things as they are, IIUC.

@JeffBezanson
Copy link
Member

Ah, I just realized that rand is documented as giving a result in [0, 1), and it wouldn't be great to make that different for different RNGs, so we might not really have the opportunity to change this now after all.

@vigna
Copy link

vigna commented Jun 12, 2021

OK, I just read the comment from @perrutquist about returning 1 when using all 64 bits. I never realized that.

Yes, I think that's one of the reasons everybody uses 53 bits of precision. You really do not want to return 1. The classical C idiom for uniform generation in the unit interval is indeed rand() / (RAND_MAX + 1.0).

And yes, I would keep things as they are. If you want more precision for smaller numbers, the multiple-value generation used by the code I pointed to in my previous message is the way to go, but it's a slower execution path that contains a test.

@tpapp
Copy link
Contributor Author

tpapp commented Jun 12, 2021

rand is documented as giving a result in [0, 1)

Technically, (0, 1) is a subset of [0, 1), so I would not consider changing this breaking.

That said, if it's too much of a hassle to change this, then we should just document the bias. For Float64 it is of course not practically significant, but for Float16 it is.

@vigna
Copy link

vigna commented Jun 12, 2021

One thing that come to mind is: do you know other implementations trying to fix this bias? Because all recent code I've ever seen uses the stranded 53-bit technique. It is possible that there is some unintended consequence in the change you're proposing.

On the other hand there are generators, such was MRG32ka, which by design never return zero. This is why MRG32ka fails PractRand bit-pattern tests.

@perrutquist
Copy link
Contributor

It is worth noting that in the 52 bit case (where this issue started) the negative bias in the expected value was 1 ULP, but in the 53 bit case it is only 0.5 ULP, which is a commonly used threshold for when errors in floating point functions are acceptable.

@chriselrod
Copy link
Contributor

One thing that come to mind is: do you know other implementations trying to fix this bias?

In VectorizedRNG I mask the UInt into [1,2), and then subtract prevfloat(one(T)).
With Float64, this puts the result in the interval [1.1102230246251565e-16, 0.9999999999999999].

I haven't seriously considered the consequences, I just didn't want the RNG to be able to return 0.0.

@vigna
Copy link

vigna commented Jun 12, 2021

That's my question, but with "other" mean "really other". Like, what do FORTRAN implementations do? Also, does that shift leaves the uniform generation in ranges [0..2^k) perfectly uniform (I guess so)?

@mschauer
Copy link
Contributor

Just to make sure: there are numbers between eps()/4, eps()/2, no need relate the bias correction to eps(). As I said above:

We do not need to add eps()/2, we need to add the right number such that the result is unbiased in [0,1],

@JeffBezanson
Copy link
Member

So what is the right number then? IIUC, eps()/4 is it.

@perrutquist
Copy link
Contributor

If rand() should never return 1.0, then eps()/4 cannot be used. With 53 bits of randomness, the best course is probably to add nothing, since the the rounding following the addition would introduce unevenness in the distribution.

On the other hand, I think eps()/2 is the best number to add with 52 bits of randomness, since no rounding would take place.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jun 17, 2021

It seems to me that these three properties cannot all hold at the same time for a floating-point RNG:

  1. The sample interval is [0,1)
  2. The RNG is unbiased
  3. The RNG is uniform

For real numbers, yes, one can do unbiased, uniform sampling from [0,1) because {1} has measure zero so not sampling it has no effect on the bias. But for floating point, it seems impossible. The natural interpretation of a uniform RNG on [0,1) seems to be that one, in principle, samples uniformly from the mathematical interval [0,1) and then takes closest candidate float value not larger than that sampled real value. But that will clearly be biased, just as our RNG is. In order to unbias the RNG, you'd have to round the real value instead, which would either sometimes give you 1.0 or you would have to avoid that by flooring real values in the interval [prevfloat(1.0), 1). In that case your RNG would still be uniform in the sense that it chooses representatives of the underlying reals uniformly, but it would favor the value prevfloat(1.0) 50% more than prevfloat(1.0, 2), which seems bad and like it should count as a violation of 3.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jun 17, 2021

Triage feels that we should do what everyone else is doing, which seems to be generating 53-bit sample from [0,1). I.e. the status quo. Would be good to add a note though that the RNG is slightly biased and it might make sense to provide RNGs in a package that are unbiased. It was noted that the reason that adding eps(0.1) previously was ok was that we were only generating values from [0.0, prevfloat(1.0, 2)] since the last bit of every generated float was zero. Adding eps(1.0) made the last bits one instead of zero and excluded 0.0 from the output values.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jun 17, 2021
@StefanKarpinski
Copy link
Member

@oscardssmith noted that we could allow passing a rounding mode to rand on the premise that we're sampling a uniform real from [0, 1) and then rounding it down. You could ask for a different rounding mode to round up giving a value from (0, 1] or ask for round to nearest to get a value from [0, 1] with 0 and 1 having half the probability of the more central values. While cool, this seems like it would be code that no one ever uses and could go in a package.

@JeffBezanson JeffBezanson removed this from the 1.7 milestone Jun 30, 2021
@JeffBezanson
Copy link
Member

I'll close this since the issue has been mitigated at least somewhat, and we've more or less decided that the current algorithm is ok.

@danielwe
Copy link
Contributor

danielwe commented Apr 7, 2022

Apologies for adding to this old, long, and closed thread, but I encountered an example that I think is relevant context for the discussion: cuRAND, the rng library bundled with CUDA, samples uniform floats with the following properties:

  • The interval is (0, 1], i.e., zero is excluded and 1 is included. See https://docs.nvidia.com/cuda/curand/host-api-overview.html#generation-functions
  • The transformation from bits to float uses all 32/64 bits of the underlying UInt32/64, such that the smallest possible value is 2f0^-33/2.0^-65. Hard to know exactly how carefully they are about rounding as it's not open source. Hopefully they're doing it right.

More elaboration here: JuliaGPU/CUDA.jl#1464 (comment) (note that I hadn't read this thread yet so I was a bit ignorant about the subtleties of rounding).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants