Zero Tolerance for Bias

190 点作者 Harmohit12 个月前

8 条评论

Interesting, but there's one part I'm not sure I agree with.> Pseudo-random number generators are useful for many purposes, but unbiased shuffling isn't one of them.A properly seeded CSPRNG is perfectly fine at this. And if it's not, then all of our cryptography is pretty much screwed. This is why in modern kernels, /dev/random and /dev/urandom are the same (minus differences in behavior when the initialization isn't complete). As D.J. Bernstein put it, it's superstition to not trust CSPRNGs. <a href="https://www.mail-archive.com/cryptography@randombit.net/msg04763.html" rel="nofollow">https://www.mail-archive.com/cryptography@randombit.net/msg0...</a> And if it's good enough for crypto, it's good enough for card shuffling.FYI I am not a cryptographer

评论 #40620950 未加载

评论 #40620953 未加载

skybrian11 个月前

I recently stumbled across some other ways that random number generation can go wrong.Suppose you want reproducible results from a seed, along with parallelism. Algorithmic random number generators are usually mutable and generate a sequence of results, limiting parallelism. Rather than a sequence, you want something tree-shaped where you can create an independent random stream for each child task. In higher-level API's, a jump or split operator can be useful.Counter-based random number generators [1] seem pretty useful in that context. An immutable random number generator works like a hash algorithm that maps each input to an output that's difficult to predict. The problem with this is being careful to avoid using the same input twice. You can think of it as allocating random numbers from a very large address space in a reproducible way. How do you partition the address space, predictably, so that every address is used at most once, and nobody runs out?Giving each child a unique ID and generating a stream from that is one way. If the tree is deeper, you'll want a unique seed for each path.When a mutable random number generator is copied to a child task (or maybe just to an iterator), the same random numbers might be generated in two places. Avoiding this is the sort of thing that Rust's borrow checker can prevent - borrowing is okay, but you want to prevent multiple concurrent ownership.[1] <a href="https://en.wikipedia.org/wiki/Counter-based_random_number_generator" rel="nofollow">https://en.wikipedia.org/wiki/Counter-based_random_number_ge...</a>

评论 #40620876 未加载

评论 #40621043 未加载

评论 #40621766 未加载

orlp11 个月前

Here is a trivial shuffle algorithm that is completely unbiased and only requires an unbiased coin (or random number generator giving bits):1. Randomly assign each element to list A or list B. 2. Recursively shuffle lists A and B. 3. Concatenate lists A and B.To prove it's correct, note that assigning a random real number to each element and sorting based on that number is an unbiased shuffle. Then we note the above does in fact do that by considering the fractional base-2 expansion of the random numbers, and noting the above is in fact a base-2 radix sort of these numbers. We can sort these random real numbers even though they have an infinite amount of random bits, as we can stop expanding the digits when the prefix of digits is unique (which corresponds to the event that a list is down to a single element).I call the above algorithm RadixShuffle. You can do it in base-2, but also in other bases. For base-2 you can make it in-place similar to how the partition for Quicksort is implemented in-place, for other bases you either have to do it out-of-place or in two passes (the first pass only counting how many elements go in each bucket to compute offsets).The above can be combined with a fallback algorithm for small N such as Fisher-Yates. I believe even though the above is N log N it can be faster than Fisher-Yates for larger N because it is exceptionally cache-efficient as well as RNG-efficient whereas Fisher-Yates requires a call to the RNG and invokes an expected cache miss for each element.---Another fun fact: you can turn any biased memoryless coin into an unbiased one with a simple trick. Throw the coin twice, if it gives HH or TT you throw away the toss, if it's HT or TH you use the first toss as your unbiased coin.This works because if p is the probability that heads comes up we have:<pre><code> HH: p^2 HT: p(1-p) TH: (1-p)p TT: (1-p)^2 </code></pre> Naturally, p(1-p) and (1-p)p are equiprobable, thus if we reject the other outcomes we have distilled an unbiased coin out of our biased coin.

评论 #40621061 未加载

评论 #40621562 未加载

评论 #40622156 未加载

评论 #40620605 未加载

评论 #40621754 未加载

评论 #40625326 未加载

评论 #40620124 未加载

评论 #40621057 未加载

评论 #40621321 未加载

tomcam11 个月前

I'm sure I'll get roasted for this, but getting the right answer will scratch a years-old itch. Why aren't most random number generators seeded using the system clock in addition to the existing algos?

评论 #40621182 未加载

评论 #40621421 未加载

评论 #40623771 未加载

im3w1l11 个月前

Fisher Yates is simple, fast and correct. The only area of improvement I can think of is reducing cache misses.

评论 #40620306 未加载

评论 #40620908 未加载

zug_zug11 个月前

Kinda interesting.There are N! permutations in a shuffle (no duplicates) and there's an algorithm where you pick one random number between 1->N! and then bring up that permutation (though I don't know how to do it better than N log N with an order-statistic tree). I like this because it requires exactly one random number.A trivial solution in functional programming (for those of us who find this swap stuff really unreadable and error-prone) would be something like:[1,2,3,4,5,6].map(x => {return {value: x, order: Math.random()}}).sort((a,b) => (a.order - b.order)).map(x => x.value)Of course this is N-Log-N, but personally I think it's easy to forget how small logN grows. Like log10(number of atoms in universe) = 82, so if your dataset is smaller than the number of atoms in the universe you could think of it as less than the constant 82.

评论 #40619992 未加载

评论 #40621664 未加载

tbrownaw11 个月前

Trusting a broken library seems like a different kind of error than implementing an algorithm that isn't quite the algorithm you meant to implement.

westurner11 个月前

python has random.shuffle() and random.sample() with an MT Mersenne Twister PRNG for random. <a href="https://docs.python.org/3/library/random.html#random.shuffle" rel="nofollow">https://docs.python.org/3/library/random.html#random.shuffle</a> Modules/_randommodule.c: <a href="https://github.com/python/cpython/blob/main/Modules/_randommodule.c">https://github.com/python/cpython/blob/main/Modules/_randomm...</a> , Library/random.py: <a href="https://github.com/python/cpython/blob/main/Lib/random.py#L354">https://github.com/python/cpython/blob/main/Lib/random.py#L3...</a>From "Uniting the Linux random-number devices" (2022) <a href="https://news.ycombinator.com/item?id=30377944">https://news.ycombinator.com/item?id=30377944</a> :> > In 2020, the Linux kernel version 5.6 /dev/random only blocks when the CPRNG hasn't initialized. Once initialized, /dev/random and /dev/urandom behave the same. [17]From <a href="https://news.ycombinator.com/item?id=37712506">https://news.ycombinator.com/item?id=37712506</a> :> "lock-free concurrency" [...] "Ask HN: Why don't PCs have better entropy sources?" [for generating txids/uuids] <a href="https://news.ycombinator.com/item?id=30877296">https://news.ycombinator.com/item?id=30877296</a>> "100-Gbit/s Integrated Quantum Random Number Generator Based on Vacuum Fluctuations" <a href="https://link.aps.org/doi/10.1103/PRXQuantum.4.010330" rel="nofollow">https://link.aps.org/doi/10.1103/PRXQuantum.4.010330</a>google/paranoid_crypto.lib.randomness_tests: <a href="https://github.com/google/paranoid_crypto/tree/main/paranoid_crypto/lib/randomness_tests">https://github.com/google/paranoid_crypto/tree/main/paranoid...</a>