An interesting tool to help preserve floating point precision is Herbie (<a href="https://herbie.uwplse.org/" rel="nofollow">https://herbie.uwplse.org/</a>), which takes an expression as input and outputs an optimized version of it that will retain precision.<p>The example they give on their site is:<p><pre><code> sqrt(x+1) - sqrt(x) => 1/(sqrt(x+1) + sqrt(x))</code></pre>
There are a few edge cases in geospatial computational geometry that require greater than quad precision to correctly compute a double precision result. This fact is sometimes used as a litmus test for implementation correctness. Many implementations don't have a conditional escape for these edge cases that switches to arbitrary precision algorithms in their double precision algorithms, which is easy to verify.
I remember a class I had where we were shown a methods for exact PC arithmetic with integers, rationals (e.g., 1/2) and algebraic numbers (e.g., sqrt(2)). Only thing it couldn't deal with were transcendental numbers (e.g., pi)<p>I think it worked by representing the numbers by integer matrices, all operations were then matrix operations. Unfortunately, the matrices were gaining dimension when new algebraic numbers got involved, so it probably wasn't useful for anything :-).<p>Anyway, it it blew me away anyway at the time, one of the few things that sticked with me from school.
If anyone ever runs out of precision using IIR filters: convert to second-order section (SOS). I stumbled upon this trick a few years ago and it's a game changer. No need for FIR.
This reminded me of Posits[1] and Unums[2]. I dabbled with my own implementation some years ago, but nothing serious.<p>Anyone tried them out? I see there's some FPGA implementations and a RISC-V extension idea[3], would be fun to try it on a softcore.<p>[1]: <a href="https://posithub.org/docs/posit_standard.pdf" rel="nofollow">https://posithub.org/docs/posit_standard.pdf</a><p>[2]: <a href="http://johngustafson.net/unums.html" rel="nofollow">http://johngustafson.net/unums.html</a><p>[3]: <a href="https://www.posithub.org/khub_doc" rel="nofollow">https://www.posithub.org/khub_doc</a>
>> Numerically stable algorithms exist for such problems, but these occasionally fail leaving us to brute force the calculation with higher precision to minimize floating point rounding errors.<p>Do algorithm authors really implement two code paths?<p>One that uses a cost function iterative algorithm to solve the matrix (this is what everybody does), and a second algorithm using the brute force approach the author showed (known not to work).
First thought: the curve in that first plot reflects the pseudospectrum? (<a href="https://en.wikipedia.org/wiki/Pseudospectrum" rel="nofollow">https://en.wikipedia.org/wiki/Pseudospectrum</a>) Spectra of nonnormal matrices can be very hard to calculate accurately.
The are algorithms taught in numerical analysis class for re-ordering calculations to maximize precision. Computing in order you wrote on paper or the way you were initially taught may not be the best.
I'll admit being lost in some of the math here but I wonder perhaps naively, if a language like Julia, which does fractions as fractions, would handle it better.
Thankfully someone figured out that quad should be in C++ a while back, unfortunately they didnt figure out that the next few dozen should be too... the arbitrary precision stuff is an alternative in theory, and would be if any of the hundreds of half completed projects to convert unintelligeble, and unmaintainable, unclear, etc, C to the in this specific case extremely much better C++ interfaces was complete.
There is a whole field about that, called numerical analysis: <a href="https://en.m.wikipedia.org/wiki/Numerical_analysis" rel="nofollow">https://en.m.wikipedia.org/wiki/Numerical_analysis</a>
Obligatory reference to computable analysis: <a href="https://en.wikipedia.org/wiki/Computable_analysis" rel="nofollow">https://en.wikipedia.org/wiki/Computable_analysis</a><p>This shows that you <i>can</i> do exact real arithmetic without worrying about round-off errors. But it's expensive.<p>If you want something more practical (but dodgy), you can use arbitrary-precision arithmetic instead of exact real arithmetic. To that end, there's the Decimal module in the Python standard lib. There's also the BC programming language which is part of POSIX. If you want to do matrix arithmetic, you can use mpmath for Python like they did in this blog post, or Eigen3 for C++ with an arbitrary-precision type. This is dodgy of course because the arithmetic remains inexact, albeit with smaller rounding errors than double-precision floats.