Note that this is equivalent to computing `upperbound * randbits(nbits + 64) >> (nbits + 64)` combined with an early-exit optimization, albeit one that becomes less effective as `upperbound` increases: if `upperbound = 2^nbits - 1`, the probability of using the early exit is basically zero. (As the author points out, Lemire's optimization may be used to improve this exit test).<p>The early exit optimization, however, may leak information about the random output, since it tells someone with access to timing information whether one or two outputs was used. With standard rejection sampling, the entire sample is discarded if a bias would be produced, so the output is independent of the number of samples. However, here, certain outputs cannot be produced by the early exit (for sufficiently large upper bounds), meaning that an attacker could narrow down the random number if the algorithm exited early.<p>I think that, to avoid this timing attack vector, the early exit would need to be eliminated - in which case the algorithm would perform no better (and usually worse) than Lemire's approach in terms of expected calls to the random number generator.
Hi, author here. Note this is not really a "new idea"; in a very reasonable sense it's simply the most obvious thing possible. I do think that it's likely the first mainstream "productized" implementation, however (though I would be interested to hear of other places that it's been tried).
I'm a bit confused about this method.<p>So the standard argument against such a procedure is that if you generate N random bits, each of {0,1}^N bitstrings is equiprobable and therefore no mapping of {0,1}^N to {0..U-1} can map an equal number of bitstrings to each output.<p>A priori the method seems to work around this by conditionally sampling more bits, so that your domain is not one of fixed-length bitstrings. But then there is this paragraph:<p>> More intriguing still, this algorithm can be made unconditional by removing the early out, so that every value computed requires word size + 64 bits from the stream, which breaks the loop-carried dependency for fast generators, unlocking vectorization and parallelization where it was previously impossible. This is an especially powerful advantage when paired with bitstream generators that allow skip-ahead such as counter-based generators or PCG.<p>But now we are mapping {0, 1}^N into {0..U-1} right? So this mapping ought not be uniform? In fact I'm not sure why the early-termination method even works, because for a fixed U we know the maximum depth of the generated-bitstring-tree, so we can just pad it with imaginary random bits to arrive at the contradiction that U does not divide 2^N.<p>I imagine I missed something, what is it?<p>EDIT: thinking about it more, if you sample finitely many bits then each possible output has a probability that is a dyadic rational (fractions of the form a/2^k) which does not include all integer reciprocals.
I did the work that Lemire links there.<p>It started with realizing that Lemire's rejection method becomes more likely to need a division/modulo as the range maxes out (the gate on division is fraction < range, so as range -> maxint the division case approaches 100%, <i>if</i> the fraction and range have the same precision). If the range is 32 bit but the fraction is 64, the division case is 2^32 times less likely, so I worked out Lemire's method for 32.64 fixed point.<p>That turned out to be similar to the infinite-precision case described here, as we only need 32 bits of fraction if it's far enough from the edges, ie nothing can carry into the integer part if we extend to 64.<p>Once I worked out when to extend to 64 bits in the rejection method, I realized that the infinite-precision case is similar, using a loop instead of a conditional. It's simpler since there's no rejection, just a decision to continue as long as it's possible to carry.
This is essentially arithmetic encoding to do base conversion from binary to n-ary. Once you have enough bits, you're almost certainly not on the boundary of a bucket. Funny how the same thing seems to have been reinvented...
This PR should be submitted, as is, to a peer reviewed journal and published again as is. It would be great if the DOI resolved to the PR itself!<p>I don't mean this as a joke, I mean this as a way to contribute (slightly humorously, or perhaps even not) to upending an outdated, yet valuable, way of disseminating scientific research.
I thought because of the branch name they used for their PR that it was going to be a joke PR, but the mathematics of it is outside of what I understand well enough to judge if it’s legit. But from the way it reads it sounds honest.<p>So assuming that it is legit, it sounds really intriguing especially because of the claim they make that it “breaks the loop-carried dependency for fast generators, unlocking vectorization and parallelization where it was previously impossible”.