This is one of the most impressive papers I saw in a while. The authors generalize conventional neural networks (which heavily rely on floating point operations for both training and inference) to also include three valued and binary logic. Unlike existing BNNs (binary neural networks) this approach allows them to completely get rid of slow floating point operations, including during training. For example, for the non floating point case, instead of the conventional (infinitesimal) derivative they use discrete calculus where the derivative is defined as f'(x) = f(x + 1) − f(x).<p>But their framework is general and compatible with binary, ternary an floating point precision. So they allow for training mixed precision models, which is probably important for models which require high precision in some aspects to achieve high downstream accuracy. (As an example they cite image segmentation and super resolution, for which they also include benchmarks to demonstrate achieving high accuracy.)<p>The whole thing could substantially reduce training time, apart from inference time and memory requirements. (Most conventional BNNs only reduce the latter two, but still require training a full precision model first. Or when training from scratch, they still use floating point operations in various ways. Which probably explains why conventional BNNs are currently not used for training.)<p>The main limitation of their work is the fact that current GPUs aren't optimized for binary operations, only for floating point operations (mainly multiplication). So the authors calculate analytically how much their approach would reduce energy consumption during training, which should closely correspond to actual "compute" requirements on optimized hardware. For their benchmarks they show the calculated energy requirement to be only a fraction of the floating point base line (and even of other conventional BNNs) while achieving similar model accuracy.<p>It's also interesting to note that this is an academic publication (it was accepted for NeurIPS 2024) but sponsored by Huawei. So the full code is not available, though they provide a lot of implementation details in the paper/appendix. I wonder whether Huawei jumps on the opportunity and develops a machine learning accelerator for binary operations, which would compete with GPUs that are mainly optimized for FLOPs.