I have worked on a performance-portable math library. We implement BLAS, sparse matrix operations, a variety of solvers, some ODE stuff, and various utilities for a variety of serial and parallel execution modes on x86, ARM, and the three major GPU vendors.<p>The simplest and highest-impact tests are all the edge cases - if an input matrix/vector/scalar is 0/1/-1/NaN, that usually tells you a lot about what the outputs should be.<p>It can be difficult to determine sensible numerical limit for error in the algorithms. The simplest example is a dot product - summing floats is not associative, so doing it in parallel is not bitwise the same as serial. For dot in particular it's relatively easy to come up with an error bound, but for anything more complicated it takes a particular expertise that is not always available. This has been a work in progress, and sometimes (usually) we just picked a magic tolerance out of thin air that seems to work.<p>Solvers are tested using analytical solutions and by inverting them, e.g. if we're solving Ax = y, for x, then Ax should come out "close" to the original y (see error tolerance discussion above).<p>One of the most surprising things to me is that the suite has identified many bugs in vendor math libraries (OpenBLAS, MKL, cuSparse, rocSparse, etc.) - a major component of what we do is wrap up these vendor libraries in a common interface so our users don't have to do any work when they switch supercomputers, so in practice we test them all pretty thoroughly as well. Maybe I can let OpenBLAS off the hook due to the wide variety of systems they support, but I expected the other vendors would do a better job since they're better-resourced.<p>For this reason we find regression tests to be useful as well.
Mentioning this b/c it was something that surprised me when I first heard it:<p>Many python numerical libraries change how they internally represent certain data types and/or change internal algorithms from version to version. This means that running the exact same code on two different versions may give you slightly different results.<p>In your side project this may not be a big deal but if you are working at, say, a hedge fund or a company that does long and complicated data processing pipelines then version upgrades with no unit testing can be "very bad".
All good stuff, I’d add though that for many numerical routines you end up testing on simple well known systems that have well defined analytic answers.<p>So for e.g. if you’re writing solvers for ODE systems, then you tend to test it on things like simple harmonic oscillators.<p>If you’re very lucky, some algorithms have bounded errors. For e.g. the fast multipole method for evaluating long range forces does, so you can check this for very simple systems.
If you have a numerical routine that converts from one representation to another, you can test it by calling the routine implementing the inverse conversion.
For coordinate transforms specifically, I also like to run through the whole chain of available, implemented transforms and back again - asserting I have the same coordinate at the end. One method might be wrong, but they "can't" all be wrong.
Start simpler.<p>Check expected behavior of...<p>Nan, +0, -0, +1, -1, +Inf, -Inf, +Eps, -Eps, +1+Eps, +1-Eps, -1+Eps, -1-Eps<p>...on each coordinate when supplying a sensible constant for the other coordinates.<p>Check the outer product of the above list taken across all coordinates.<p>Check that you get the same answer when you call the function twice in each of these outer product cases.<p>This approach works for any floating point function. You will be surprised how many mistakes these tests will catch, inclusive of later maintenance on the code under test.
Another trick is to use 32 bit float and just test every single number.<p>4 billion is something computers can handle, and it will give you pretty much all the edge cases of floating point representation.
Makes me wonder if there should be an open source test suite for numerical routines, i.e. just a big table (in JSON or something) of inputs, transformations and expected outputs, so new projects could get easily started with a fairly robust (if not a bit generic) test suite they could filter to suite their needs as they grow the feature set.
Honestly, I was expecting to see suggestions for testing for numerical stability. This might be a super annoying issue if the data is just right to hit certain peculiarities of floating point numbers representation.