Perfect Random Floating-Point Numbers

110 点作者 pclmulqdq4 天前

13 条评论

I've written an algorithm that generates a uniform random float in the range [a,b] that can produce every representable floating-point value with a probability proportional to the covered real number range*: <a href="https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1464">https://github.com/camel-cdr/cauldron/blob/main/cauldron/ran...</a>* Well, almost. I messed up the probability around subnormals slightly and haven't gotten around to fixing it yet.Still, the algorithm it self should works for all other values and it was measurably more accurate than the regular algorithm.In terms of performance it is 10x slower compared to the regular implementation of `(randu32(x)>>8)*(1.0f/(1<<24))` on my Zen1 desktop.

评论 #43919807 未加载

FabHK1 天前

There's another approach for doing this: generate a random number between 1 and 2 (they all have the same exponent) and then subtract 1 (that's the default implementation in Julia [0]). But I think it also just gives you 2^53 different numbers.So, between .5 and 1 you miss out on every second representable number, between 0.25 and .5 you miss out on 3/4 of them, and so on.I guess for many cases that's good enough, but the article seems like a nice improvement.ETA: Lemire has some thoughts on this [1] and links to what might be a prior solution [2]. Vigna (of xoroshiro fame) writes about it at the bottom of [3] and also links to [2]. So, presumably the implementation described in the article is faster? ("There have been some past attempts to fix these flaws, but none that avoid a huge performance penalty while doing so."EDIT2: BTW, one of the things I love about HN (well, the world, really) is that there are people that care deeply that we can uniformly sample floats between 0 and 1 correctly, and all of them, and do it faster.[0] see <a href="https://github.com/JuliaLang/julia/blob/master/stdlib/Random/src/generation.jl">https://github.com/JuliaLang/julia/blob/master/stdlib/Random...</a><pre><code> rand(r::AbstractRNG, ::SamplerTrivial{CloseOpen01_64}) = rand(r, CloseOpen12()) - 1.0 </code></pre> [1] <a href="https://lemire.me/blog/2017/02/28/how-many-floating-point-numbers-are-in-the-interval-01/" rel="nofollow">https://lemire.me/blog/2017/02/28/how-many-floating-point-nu...</a>[2] <a href="https://mumble.net/~campbell/2014/04/28/uniform-random-float" rel="nofollow">https://mumble.net/~campbell/2014/04/28/uniform-random-float</a><a href="https://mumble.net/~campbell/2014/04/28/random_real.c" rel="nofollow">https://mumble.net/~campbell/2014/04/28/random_real.c</a>[3] <a href="https://prng.di.unimi.it" rel="nofollow">https://prng.di.unimi.it</a>

评论 #43918821 未加载

评论 #43916453 未加载

评论 #43916563 未加载

pixelpoet1 天前

Some good references on this which IMO should have been mentioned in the article:<a href="https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-interval" rel="nofollow">https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-inte...</a><a href="https://marc-b-reynolds.github.io/distribution/2017/01/17/DenseFloat.html" rel="nofollow">https://marc-b-reynolds.github.io/distribution/2017/01/17/De...</a>

possiblywrong1 天前

> Second, the least significant bits that come from this generator are biased.I don't think this is true; I'm guessing that the author borrowed this observation from one of the various other articles linked in this thread, that address the situation where we draw a 64-bit random unsigned integer and divide by `1<<64`, instead of drawing 53 bits and dividing by `1<<53`. The former does introduce a bias, but the latter doesn't (but does still limit the fraction of representable values).

评论 #43919810 未加载

tialaramex1 天前

> For unbiased random floating-point numbers, generate floating-point numbers with probabilities given by drawing a real number and then rounding to floating point.Immediately there are alarms going off in my mind. Your machine definitely can't pick real numbers, Almost All of them are non-computable. So you definitely can't be doing that, which means you should write down what you've decided to actually do because it's not that.What you actually want isn't the reals at all, you want a binary fraction (since all your f64s are in fact binary fractions) between 0 and 1, and so you just need to take random bits until you find a one bit (the top bit of your fraction), counting all the zeroes to decide the exponent, then you need 52 more bits for your mantissa.I'm sure there's a faster way to get the same results, but unlike the article's imagined "drawing a real number" this is actually something we can realize, since it doesn't involve non-computable numbers.Edited: You need 52 more bits, earlier this comment said 51 but the one bit you've already seen is implied in the floating point type, so we need 52 more, any or all of which might be zero.

评论 #43918152 未加载

评论 #43924551 未加载

badmintonbaseba1 天前

Generalizing this to arbitrary intervals, not just [0, 1) looks tricky. Just scaling and shifting a perfect uniform result from [0, 1) doesn't result in a perfect random floating-point number.

评论 #43918111 未加载

kevmo3141 天前

Cool observation! Despite knowing both about how floating points work and how random number generation works, this never occurred to me.I do wish the solution were a bit simpler though. Could this be upstreamed into the language's API? <a href="https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/math/rand/rand.go;l=189" rel="nofollow">https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/...</a>

评论 #43917543 未加载

hedora1 天前

I thought this was going to go further down the path that the Die Hard Battery of random number tests started:<a href="https://www.jstatsoft.org/index.php/jss/article/view/v007i03/892" rel="nofollow">https://www.jstatsoft.org/index.php/jss/article/view/v007i03...</a>(Dieharder is probably a better suite to use at this point, but that paper is approachable.)

评论 #43919671 未加载

FabHK大约 22 小时前

So, fun fact:Between 0 and 1, you have about a quarter of all floating point numbers (and then a quarter > 1, a quarter < -1, and a quarter between -1 and 0).Between 1 and 2, you have about 0.024% of all floating point numbers (for double precision, a factor of around 1023).

gitroom1 天前

Yeah I've messed with this and always end up wishing the simple ways were actually good enough. The bit where floats aren't really evenly spaced just gets annoying. Still, I kinda get the itch to cover all cases. Respect.

sfpotter1 天前

Probably relevant: <a href="https://prng.di.unimi.it/" rel="nofollow">https://prng.di.unimi.it/</a>I use the PRNGs here for my own needs and they work great... at least as far as I can tell. :-)

评论 #43919651 未加载

lutusp1 天前

"Perfect Random [sic] Floating-Point Numbers" ???I had hoped that, somewhere in the article, the author would say, "In this article, the term 'random' is shorthand for 'pseudorandom'." But no.Programming students might read the article and come away with the idea that a deterministic algorithm can generate random numbers.This is like the sometimes-heard claim that "Our new error-detecting algorithm discovers whether a submitted program contains errors that might make it stop." Same problem -- wrong as written, but no qualifiers.

评论 #43918440 未加载

mahemm1 天前

Why not just read 64 bits off /dev/urandom and be done with it? All this additional complexity doesn't actually buy any "extra" randomness over this approach, and I'm skeptical that it improves speed either.

评论 #43917528 未加载

评论 #43917534 未加载