It has about 15 KB of data: 10KB in power_of_ten_components and 5KB in mantissa_128. [1] That's ~23% of the per-core L1 data cache on (eg) the recently announced APU chip. [2, 3] So before getting too excited about the microbenchmark numbers, I'd be careful to ensure the additional cache pressure doesn't slow down my program as a whole.<p>edit: also, ~2.5KB of that is due to padding in power_of_ten_components. I wonder if it'd be better to split the uint64_t field into two uint32_ts to avoid this.<p>[1] <a href="https://github.com/lemire/fast_double_parser/blob/master/include/fast_double_parser.h" rel="nofollow">https://github.com/lemire/fast_double_parser/blob/master/inc...</a>
[2] <a href="https://news.ycombinator.com/item?id=22440894" rel="nofollow">https://news.ycombinator.com/item?id=22440894</a>
[3] <a href="https://en.wikichip.org/wiki/amd/ryzen_embedded/r1606g" rel="nofollow">https://en.wikichip.org/wiki/amd/ryzen_embedded/r1606g</a>
> That is, given the string "1.0e10", it should return a 64-bit floating-point value equal to 10.<p>Err, surely it should be equal to 10,000,000,000. Or more probably, they meant to write "1.0e1".
More context at Daniel Lemire's blog post at <a href="https://lemire.me/blog/2020/03/10/fast-float-parsing-in-practice/" rel="nofollow">https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...</a> . It's about twice as fast as abseil or from_chars and nearly 10x faster than strtod.
I was expecting this to include fancy bit twiddling and simd assembly for parsing 8 decimal digits at once...<p>But the reality is the core of the algorithm is still a while loop that looks at each digit one at a time and multiplies by 10 each time ...
Is there a site out there that collects the absolute fastest ways of doing common low-level operations like this? For something as common as converting strings of digits to IEEE 754 floats/doubles, you would think we would already have the absolute fastest sequence of assembly instructions to do this. It's disconcerting thinking that the functions in the standard c/c++ library may not even be close to optimal.<p>Very cool btw
Great hack, but in a broader sense this is silly. If parsing/pretty-printing floats from/to ASCII strings is a bottleneck you should be using hexfloats, as supported in the latest C/C++ standards and elsewhere. As a bonus, they will reliably evaluate to the same floating point number, eliminating an extra source of ambiguity and possible overhead.
What's a scenario where you have a huge flow of floating point data as text? Assuming it's usually json, but in which case does that happen and you need to work around it (parse faster) rather than fix the underlying problem of having a huge stream of numbers as <i>text</i>?