On Melissa O’Neill’s PCG random number generator

104 点作者 ozanonay将近 8 年前

13 条评论

FabHK将近 8 年前

A few notes:The author writes "Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result", and then quotes some of these words:<pre><code> Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*. </code></pre> However, the author fails to note that said "influential researcher", Sebastiano Vigna, is the author of xorshift128+ and related PRNG.In the linked test [2] by John D. Cook (who uses PactRand, a test similar to the (obsolete) DIEHARD), xorshift128+ and xoroshir0128+ fail within 3 seconds, while PCG ran 16 hours producing 2 TB of pseudo-random numbers without any suspicious p-value detected.On the other hand, Vigna claims that the xoroshiro family does "pass" PactRand.I've submitted an answer to StackOverflow a while ago [1], recommending xoroshiro and PCG, thus I'd be concerned if PCG turns out to be flawed. It's actually quite hard to get academics in the field to give an authoritative recommendation (I've tried) - their response is typically along the line "It's complicated"...[1] <a href="https://stackoverflow.com/questions/4720822/best-pseudo-random-number-generator/38202922#38202922" rel="nofollow">https://stackoverflow.com/questions/4720822/best-pseudo-rand...</a>[2] <a href="https://www.johndcook.com/blog/2017/08/14/testing-rngs-with-practrand/" rel="nofollow">https://www.johndcook.com/blog/2017/08/14/testing-rngs-with-...</a>Edit: remove italics due to asterisk in PRNG name, & add link to John. D Cook's test.

评论 #15025054 未加载

Animats将近 8 年前

Here's the site for the random number generator.[1] It's basically a simple linear congruential random number generator (well known, but not very good) fed into a mixer. The mixer is new.Most of the analysis is about the LCG or the final output. The suggested mixer is just<pre><code> output = rotate64(uint64_t(state ^ (state >> 64)), state >> 122); </code></pre> That's simple, and the insight in this paper is that something that simple helps a lot. I would have thought that you'd want a mixer where changing one bit of the input changes, on average, half the bits of the output. The mixer above won't do that. DES as a mixer would probably be better, but it's slower. The new result here is that something this simple passes many statistical tests.This isn't crypto-grade; both that mixer and a LCG generator are reversible with enough work.[1] <a href="http://www.pcg-random.org/" rel="nofollow">http://www.pcg-random.org/</a>

评论 #15021181 未加载

评论 #15020981 未加载

sulizilxia将近 8 年前

I love O'Neill's work on PCG, and loved the talks by her I watched online.As a tenured professor I want to say two things about this piece:1. I think academic publishing will be forced to change. I'm not sure what it's going to look like in the end, but traditional journals are starting to seem really quaint and outdated now.2. As far as I can tell from what she's written on the PCG page, the submission to TOMS is a poor example, because no one I know expects to be done with one submission. That is, no one I know submits a paper to one journal, even one reputable journal, and is done. They submit and it gets rejected and revise it and resubmit it, maybe three or even four times. After the fourth or fifth time, you might give up, but not necessarily even then.I have mixed feelings about the PCG paper as an example, because in some ways it's great: an example of how something very influential has superceded traditional academic publishing. In other ways, though, it's horrible, because it's misleading about the typical academic publishing experience. Yes, academic publishing is full of random nonsense, and corruption, but yes, you can also get past it (usually) with just a little persistence. In still other ways, it's a good example of what we might see increasingly, which is a researcher having a lower threshold for the typical bullshit out there.

mjb将近 8 年前

> I wonder whether the academic publications are growing ever less relevant to practice.I think there are two topics here. One is whether academic research and work is becoming less relevant to practice. The other is whether the formalism of academic-style publishing are becoming less relevant to the modern world where more and more venues for publishing, rating, and discovering work.On the former, I believe that academic work is as relevant as ever. There are some areas (like systems) where I'm doubtful about relevance from the point of view of a practitioner, but other areas (like hardware and ML where work remains extremely relevant). I haven't noticed a trend there over the last decade, except in some areas of systems where the industrial practice tends to happen on cluster sizes that are often not approachable for academia.On the latter, academic publication does indeed seem to be getting less relevant. There are other (often better) ways to discover work. There are other ways to tell whether a piece of work is relevant, or credible. There are other, definitely better, ways to publish and distribute work. In some sense I think this is a pity: as an academic-turned-practitioner I like academic-style publications. Still, I think they are going to either change substantially or die.This article raises another very good point: sometimes the formalism of academic publication makes the work harder to understand, less approachable, or less valuable. That's clear harm, and it seems like this professor was right to avoid that.

starmole将近 8 年前

As an engineer who switched to PCG:- PCG is not crypto, everybody should understand that. It's for simulation and rendering.- PCG mainly replaces Mersenne Twister which is in c++11. The Twister has a LOT more state and is a LOT slower for less randomness.- In rendering and simulation speed really matters, and PCG excels there.- Xorshift is another algorithm in the same class. I would really like to see an objective comparison. In my cursory engineering look PCG seemed better.- Fast PRNG is almost a new field again: It's not crypto, but immensely useful. How did the Twister get into C++11 while it is so much worse than PCG or Xorshift? Nobody cared!- Maybe PCG should have been a paper at SigGraph.- For the style of the paper, I think one contribution is rethinking PRNG outside crypto. That deserves and requires a lot of exposition.

评论 #15025195 未加载

lowmagnet将近 8 年前

I like this because she is a professor at Harvey Mudd. They took steps to make CS more inclusive, with great results. I appreciate her attitude on accessibility, which is in keeping with that institution's philosophy.That she ran into a paper wall doesn't bother her because she's openly publishing is even better.

评论 #15021460 未加载

评论 #15021982 未加载

评论 #15020740 未加载

bmm6o将近 8 年前

It sounds like an interesting result, I look forward to reading the paper more carefully. That said, it's clearly not written for an academic journal. Section 2.4.3 is entitled "The Importance of Code Size", and explains why shorter code is better. I think you can argue that some academic papers are excessively concise, but this is a 58-page paper about an RNG. It is clearly not a journal paper and has a ton of extraneous content. I have to sympathize with the commenter that the author has made a trade-off and written a paper that's less rigorous than it should be (for peer review). I wonder why she didn't write 2 versions.

评论 #15021400 未加载

lisper将近 8 年前

This is an interesting article, but it's more about the changing landscape of academic publishing than it is about random number generators.[EDIT] The actual paper is here: <a href="http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf" rel="nofollow">http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf</a>

评论 #15020176 未加载

the_stc将近 8 年前

As a general comment, I dislike deliberately obtuse writing in papers. In my current work, I came across a very in-depth survey of our industry (sex work). Excellent study, very helpful. But some of the sentences seemed to over-complicate the math. Example: "Consider the set P {p1, p2, ... pN} representing providers and the set C {c1, c2, ... cN} representing customers". I am pretty sure this kind of stuff is filler or pretends to make things look more rigorous than they are.On the other hand, maybe spending more than a line explaining what the birthday paradox is should be cut out and put in a backgrounder paper or appendix so that the paper can focus on the actual novel ideas.

评论 #15021437 未加载

评论 #15021512 未加载

drallison将近 8 年前

Melissa O'Neill gave a talk describing the PCG random number generator at Stanford in EE380. <a href="https://www.youtube.com/watch?v=45Oet5qjlms" rel="nofollow">https://www.youtube.com/watch?v=45Oet5qjlms</a>

Houshalter将近 8 年前

I once tried to develop my own fast random number generate using nothing but bitwise operations. On the theory they were the fastest/simplest. I had a program generate thousands of random combinations of bitwise functions. And then used statistical tests to see which ones produced the most "random" seeming behavior.It worked as far as I can tell. But I don't trust the statistical tests. Who is to say there isn't a very obvious pattern in the numbers that I didn't test for or notice? How do you prove a random number generator is good?

评论 #15024032 未加载

评论 #15024645 未加载

评论 #15024249 未加载

dsacco将近 8 年前

I have a few comments:1. The paper itself[1] is extremely readable by the standards of most cryptography research. On one hand, this is great because I was able to follow the whole thing in essentially one pass. On the other hand, the paper is very long for its result (58 pages!), and it could easily do without passages like this one:Yet because the algorithms that we are concerned with are deterministic, their behavior is governed by their inputs, thus they will produce the same stream of “random” numbers from the same initial conditions—we might therefore say that they are only random to an observer unaware of those initial conditions or unaware of how the algorithm has iterated its state since that point. This deterministic behavior is valuable in a number of fields, as it makes experiments reproducible. As a result, the parameters that set the initial state of the generator are usually known as the seed. If we want reproducible results we should pick an arbitrary seed and remember it to reproduce the same random sequence later, whereas if we want results that cannot be easily reproduced, we should select the seed in some inscrutable (and, ideally, nondeterministic) way, and keep it secret. Knowing the seed, we can predict the output, but for many generators even without the seed it is possible to infer the current state of the generator from its output. This property is trivially true for any generator where its output is its entire internal state—a strategy used by a number of simple random number generators. For some other generators, such as the Mersenne Twister [35], we have to go to a little more trouble and invert its tempering function (which is a bijection; see Section 5), but nevertheless after only 624 outputs, we will have captured its entire internal state.That's a lot of setup for what is frankly a very basic idea. A cryptographer being verbose in their writing might briefly remind the reader of these properties with the first sentence, but they'd still likely do that with much more brevity than this. I understand wanting to make your research accessible, but for people who understand the field this detracts from getting to the "meat." It might make it harder to get through, but a 10-30 page result is preferable to a nearly 60-page one that assumes I know nearly nothing about the field. If I don't know these details very well, how can I properly assess the author's results?2. The author's tone in her writing is something I take issue with. For example, passages like this one...Suppose that, excited by the idea of permutation functions, you decide to always improve the random number generators you use with a multiplicative step. You turn to L’Ecuyer’s excellent paper [25], and without reading it closely (who has time to read papers these days!), you grab the last 32-bit constant he lists, 204209821. You are then surprised to discover that your “improvement” makes things worse! The problem is that you were using XorShift 32/32, a generator that already includes multiplication by 747796405 as an improving step. Unfortunately, 204209821 is the multiplicative inverse of 747796405 (mod 2 32), so you have just turned it back into the far-worse–performing XorShift generator! Oops.*...go a bit beyond levity. If you're trying to establish rigorous definitions and use cases to distinguish between generators, functions and permutations, this isn't the way to do it. This isn't appropriate because it doesn't go far enough to formalize the point. It makes it intuitive, sure, and that's a great educational tool! But it's a poor scenario to use as the basis for a problem statement - research is not motivated by the failure of an engineer to properly read and understand existing primitives, it's motivated by novel results that exhibit superior qualities over existing primitives.3. The biggest grievance I have with this paper is the way in which it analyzes its primitives for cryptographic security. For example, this passage under 6.2.2 Security Considerations:In addition, most of the PCG variations presented in the next section have an output function that returns only half as many bits as there are in the generator state. But the mere use of a 2 b/2-to -1 function does not guarantee that an adversary cannot reconstruct generator state from the output. For example, Frieze et al. [12] showed that if we simply drop the low-order bits, it is possible for an adversary to discover what they are. Our output functions are much more complex than mere bit dropping, however, with each adding at least some element of additional challenge. In addition, one of the generators, PCG-XSL-RR (described in Section 6.3.3), is explicitly designed to make any attempt at state reconstruction especially difficult, using xor folding to minimize the amount of information about internal state that leaks out.17 It should be used when a fast general-purpose generator is needed but enhanced security would also be desirable. It is also the default generator for 64-bit output.That's not a rigorous analysis of a primitive's security. It is an informal explanation of why the primitive may be secure, but it so high level that there is no proof based on a significant hardness assumption. Compare this with Dan Boneh's recent paper, "Constrained Keys for Invertible Pseudorandom Functions"[2]. Appendices A and B after the list of references occupy nearly 20 pages of theorems used to analyze and prove the security of primitives explored in the paper under various assumptions.Novel research exploring functions with (pseudo)random properties is inherently mathematical; it's absolutely insufficient to use a bunch of statistical tests, then informally assess the security of a primitive based on the abbreviated references to one or two papers._________1. <a href="http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf" rel="nofollow">http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf</a>2. <a href="https://eprint.iacr.org/2017/477.pdf" rel="nofollow">https://eprint.iacr.org/2017/477.pdf</a>

评论 #15021948 未加载

fwdpropaganda将近 8 年前

Physicist here.Off-topic> And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe…At the quantum level it really is clear that we live in a really random universe. What's the meaning of really random? The outcome of a quantum process.On-topic. Yeah, you have to know your audience. As OP mentions, just because the paper wasn't published doesn't prevent anyone from thinking about it and even building on it. On the other hand these scientific publications have styles and target audiences, and maybe she got rejected not due to lack of relevance or rigor, but because the paper didn't match the publication's non-scientific criteria for publication.