The article mentions the challenges of working with a deck of 2^64 cards, but I don't see how the algorithm helps - seems like it's going to encounter the limits of double-precision values well before that. For instance, the initial setup of "qu1real":<p><pre><code> qu1real = -nreal + 1.0 + Nreal
</code></pre>
will behave poorly if N > 2^52 or so. The cited paper refers to this in the Appendix, mentioning that "Roughly log10(N) + 1 digits of precision will suffice", so 20 decimal digits worth versus the 15 available in doubles.