Python rounds float values by converting them to string and then back

268 pointsby bishalaover 5 years ago

20 comments

coldteaover 5 years ago

Seems to be one of the best ways to go about it.From the comment in protobuf source (which does the same thing as Python), mentioned in the Twitter thread:(...) An arguably better strategy would be to use the algorithm described in "How to Print Floating-Point Numbers Accurately" by Steele & White, e.g. as implemented by David M. Gay's dtoa(). It turns out, however, that the following implementation is about as fast as DMG's code. Furthermore, DMG's code locks mutexes, which means it will not scale well on multi-core machines. DMG's code is slightly more accurate (in that it will never use more digits than necessary), but this is probably irrelevant for most users.Rob Pike and Ken Thompson also have an implementation of dtoa() in third_party/fmt/fltfmt.cc. Their implementation is similar to this one in that it makes guesses and then uses strtod() to check them. (...)<a href="https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb33199984118d801956822842771e7e/src/google/protobuf/stubs/strutil.cc#L1174-L1213" rel="nofollow">https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb3...</a>

评论 #20819157 未加载

评论 #20822643 未加载

fs111over 5 years ago

Apples libc used to shell-out to perl in a function: <a href="https://github.com/Apple-FOSS-Mirror/Libc/blob/2ca2ae74647714acfc18674c3114b1a5d3325d7d/gen/wordexp.c#L192" rel="nofollow">https://github.com/Apple-FOSS-Mirror/Libc/blob/2ca2ae7464771...</a>

评论 #20818888 未加载

评论 #20818798 未加载

评论 #20818629 未加载

评论 #20818914 未加载

评论 #20822496 未加载

评论 #20819880 未加载

评论 #20818634 未加载

Noe2097over 5 years ago

Well, the problem is precisely that rounding as it is generally conceived, is expressed in base 10 - as we generally conceive numbers including floating point ones in base 10. Yet at the lowest level, the representation of numbers is in base 2, including floating point ones. It is imaginable, would be more correct and efficient to perform rounding (or flooring or ceiling, for that matter) in base 2, but it would be that more difficult to comprehend when dealing with non integers in code. Rounding in base 10 needs some form of conversion anyway, going for the string is one way that is, at least, readable (pun intended).

bhoustonover 5 years ago

In my experience there are few things slower that float to string and string to float. And it seems so unnecessary.I always implemented round to a specific digit based on the built-in roundss/roundsd functions which are native x86-64 assembler instructions (i.e. <a href="https://www.felixcloutier.com/x86/roundsd" rel="nofollow">https://www.felixcloutier.com/x86/roundsd</a>).I do not understand why this would not be preferable to the string method.float round( float x, int digits, int base) { float factor = pow( base, digits ); return roundss( x * factor ) / factor; }I guess this has the effect of not working for numbers near the edge of it's range.One could check this and fall back to the string method. Or alternatively use higher precision doubles internally:float round( float x, int digits, int base ) { double factor = pow( base, digits ); return (float)( roundsd( x * factor ) / factor ); }But then what do you do if you have a double rounded and want to maintain all precision? I think there is likely some way to do that by somehow unpacking the double into a manual mantissa and exponent each of which are doubles and doing this manually - or maybe using some type of float128 library (<a href="https://www.boost.org/doc/libs/1_63_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/float128.html" rel="nofollow">https://www.boost.org/doc/libs/1_63_0/libs/multiprecision/do...</a>)...But changing this implementation now could cause slight differences and if someone was rounding then hashing this type of changes could be horrible if not behind some type of opt-in.

评论 #20821975 未加载

评论 #20821019 未加载

评论 #20820309 未加载

bishalaover 5 years ago

Related thread on Twitter <a href="https://twitter.com/whitequark/status/1164395585056604160" rel="nofollow">https://twitter.com/whitequark/status/1164395585056604160</a>

shellacover 5 years ago

OpenJDK BigDecimal::doubleValue() goes via a string in certain situations <a href="https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/math/BigDecimal.java#L3667" rel="nofollow">https://github.com/openjdk/jdk/blob/master/src/java.base/sha...</a>

评论 #20820587 未加载

latchkeyover 5 years ago

<a href="https://0.30000000000000004.com/" rel="nofollow">https://0.30000000000000004.com/</a>

评论 #20818710 未加载

评论 #20818756 未加载

评论 #20818564 未加载

评论 #20819087 未加载

zellyover 5 years ago

This is what we are promised will make trucks drive themselves and usher in the 4th industrial revolution.

评论 #20822146 未加载

评论 #20819135 未加载

analog31over 5 years ago

My quick impression is that the choice of a rounding algorithm is relative to the purpose that it serves. For instance, floor(x + 0.5) is good enough in many applications.In some cases, rounding is performed for the primary purpose of displaying a number as a string, in which case it can't be any less complicated than the string conversion function itself.

评论 #20820245 未加载

jancsikaover 5 years ago

A bit on topic...Is there a phrase for the ratio between the frequency of an apparent archetype of a bug/feature and the real-world occurrences of said bug/feature? If not then perhaps the "Fudderson-Hypeman ratio" in honor of its namesakes.For example, I'm sure every C programmer on here has their favored way to quickly demo what bugs may come from C's null-delimited strings. But even though C programmers are quick to cite that deficiency, I'd bet there's a greater occurrence of C string bugs in the wild. Thus we get a relatively low Fudderson-Hypeman ratio.On the other hand: "0.1 + 0.2 != 0.3"? I'm just thinking back through the mailing list and issue tracker for a realtime DSP environment that uses single-precision floats exclusively as the numeric data type. My first approximation is that there are significantly more didactic quotes of that example than reports of problems due to the class of bugs that archetype represents.Does anyone have some real-world data to trump my rank speculation? (Keep in mind that simply replying with more didactic examples will raise the Fudderson-Hypeman ratio.)

评论 #20822318 未加载

d--bover 5 years ago

Note that there is a fallback version that doesn't use strings. This is definitely something that's been thought through.

ericfrederichover 5 years ago

I was looking once at Python and Redis and how numbers get stored. I remember Python would in the end send Redis some strings. I dove pretty deep and found that Python floats when turned into a string and then back are exactly the same float.I remember even writing a program that tested every possible floating point number (must have only been 32 bit). I think I used ctypes and interpreted every binary combination of 32 bits as a float, turned it into a string, then back and checked equality. A lot of them were NaN.

评论 #20820746 未加载

评论 #20820006 未加载

deckar01over 5 years ago

`blob/master` isn't a suitable permalink. Use the first few letters of the commit hash so the line numbers and code are still relevant when this file inevitably gets modified.

ChrisSDover 5 years ago

Maybe I'm missing something but what's wrong with rounding floats this way?

评论 #20818572 未加载

评论 #20818505 未加载

评论 #20818490 未加载

评论 #20818836 未加载

评论 #20818483 未加载

评论 #20818451 未加载

dahartover 5 years ago

Not entirely unlike how one of the better ways to deep-copy a JSON object in Javascript is json.parse(json.stringify(obj))

kstenerudover 5 years ago

This is where decimal floating point really shines. Since the exponential portion is base 10, it's trivially easy to round the mantissa.The only silly part of ieee754 2008 is the fact that they specified two representations (DPD, championed by IBM, and BID, championed by Intel) with no way to tell them apart.

science404over 5 years ago

Misleading title is misleading...CPython rounds float values by converting them to string and then back

评论 #20820986 未加载

Jenzover 5 years ago

I dunno, how efficient is this?

评论 #20818506 未加载

评论 #20818568 未加载

评论 #20818582 未加载

acoyeover 5 years ago

Another pragmatic aspect of Python as I see it.

seamyb88over 5 years ago

Am I the only one grimacing at the lack of curlies around if/else scope? Just good practice!

评论 #20822551 未加载