TechEcho

4 comments

kenalmost 5 years ago

Pet peeve alert: not distinguishing between "IEEE-754 type" and "real number".> System.out.println(5.1+9.2);> We ask to add 5.1 to 9.2. The result should be 14.3, but we get the following instead: 14.299999999999999That's misleading. You put the characters "5.1" in a Java source file, which in Java (like many languages) means "the IEEE-754 64-bit binary floating point value closest to the decimal number 5.1". It's not equal to the real number 5.1. 9.2 and 14.3 can't be represented exactly in binary floating point, either.The number 5.1 + the number 9.2 should be the number 14.3.The Java double 5.1 + the Java double 9.2 should not be the Java double represented by 14.3.The addition isn't really what's hurting you. The representation is. You're confusing the matter by changing type systems in the middle of a sentence.> It is a small difference (only 0.000000000000001), but it is still wrong.The answer is "wrong" in large part because the question was wrong. The error in IEEE-754 "14.3" is only slightly smaller than the error in "5.1+9.2".The sum looks more wrong than the literal "14.3" because Java's Double toString() truncates its output according to some specific rules [1] designed to guess how those 64 bits got there.> CPUs are poor at dealing with floating-point values. Arithmetics are almost always wrongWouldn't it be more accurate to say they're "wrong" at addition 50% of the time? You picked two numbers whose IEEE-754 representation are both just below their actual value, and whose sum is just above its IEEE-754 representation. Had you added "5.1+5.2" (which happen to be just below and just above their actual values, respectively), the representation errors would have cancelled, and you'd have gotten "10.3" as you expect.[1]: <a href="https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double)" rel="nofollow">https://docs.oracle.com/javase/7/docs/api/java/lang/Double.h...</a>

评论 #23355467 未加载

bluestreakalmost 5 years ago

Author here.About a month ago, I posted about using SIMD instructions to make aggregation calculations faster. I am very thankful for the feedback so far, this post is the result of the comments we received last time.Many comments suggested that we implement compensated summation (aka Kahan) as the naive method could produce inaccurate and unreliable results. This is why we spent some time integrating kahan and Neumaier summation algorithms. This post summarises a few things we learned along this journey.We thought Kahan would badly affect the performance since it uses 4x as many operations as the naive approach. However, some comments also suggested we could use prefetch and co-routines to pull the data from RAM to cache in parallel with other CPU instructions. We got phenomenal results thanks to these suggestions, with Kahan sums nearly as fast as the naive approach.A lot of you also asked if we could compare this with Clickhouse. As they implement Kahan summation, we ran a quick comparison. Here's what we got for summing 1bn doubles with nulls with Kahan algo. The details of how this was done are in the post.QuestDB: 68ms Clickhouse: 139msThanks for all the feedback so far and keep it going so we can continue to improve. Vlad

评论 #23354625 未加载

radford-nealalmost 5 years ago

While Kahan summation is more accurate than naive summation, it still does not produce the precise result (the true sum rounded to the final precision).It is not too costly to do the summation exactly. Several algorithms for this have been developed. You can read about mine at <a href="https://arxiv.org/abs/1505.05571" rel="nofollow">https://arxiv.org/abs/1505.05571</a> and get the code for them at <a href="https://gitlab.com/radfordneal/xsum" rel="nofollow">https://gitlab.com/radfordneal/xsum</a>

Things we learned about sums

4 comments

Things we learned about sums

4 comments