> The computation [...] is multiplying two 64-bit unsigned integers, giving a 128-bit product. Some machines have an instruction for that.<p>And compilers often let you call these instructions fairly directly, with compiler intrinsics.<p>With Visual Studio on Windows x64, for example, you can implement the mulul64() function with _umul128:
<a href="https://docs.microsoft.com/en-us/cpp/intrinsics/umul128" rel="nofollow">https://docs.microsoft.com/en-us/cpp/intrinsics/umul128</a>
(and expect quite a good speedup)