Here's the bottom line for anyone who doesn't want to read the whole article.<p>> Using a commercially available 28-nanometer ASIC process technology, we have profiled (8, 1, 5, 5, 7) log ELMA as 0.96x the power of int8/32 multiply-add for a standalone processing element (PE).<p>> Extended to 16 bits this method uses 0.59x the power and 0.68x the area of IEEE 754 half-precision FMA<p>In other words, interesting but not earth shattering. Great to see people working in this area though!
Not sure why this isn't getting more votes, but it's a good avenue of research and the authors should be commended. That said, this approach to optimizing floating point implementations has a lot of history at Imagination Technologies, ARM and similar low-power inferencing chipsets providers. I especially like the Synopsys ASIP Design [0] tool which leverages the open-source (although not yet IEEE ratified) LISA 2.0 Architecture Design Language [1] to iterate on these design issues.<p>Interesting times...<p>[0] <a href="https://www.synopsys.com/dw/ipdir.php?ds=asip-designer" rel="nofollow">https://www.synopsys.com/dw/ipdir.php?ds=asip-designer</a>
[1] <a href="https://en.wikipedia.org/wiki/LISA_(Language_for_Instruction_Set_Architecture)" rel="nofollow">https://en.wikipedia.org/wiki/LISA_(Language_for_Instruction...</a>
A bit off-topic, but I remember some studies about 'under-powered' ASICs, ie. running with 'lower-than-required' voltage and just letting the chip fail sometimes. I guess the outcome was that you can run with 0.1x power and get 0.9x of correctness. Usually chips are designed so that they never fail and that requires using substantially more energy than is needed in the average case. If the application is probabilistic or noisy in general, additional 'computation noise' could be allowed for better energy efficiency.
Wow! It's kind of a wierd feeling to see some research I worked on get some traction in the real world!! The ELMA lookup problem for 32 bit could be fixed by using the posit standard, which just has "simple" adders for the section past the golomb encoded section, though you may have to worry about spending transistors on the barrel shifter.
For those interested the general area I saw a good talk about representing and manipulating floating point numbers in Julia at CSAIL last week by Jiahao Chen. The code with some good documentation is on his github.<p><a href="https://github.com/jiahao/ArbRadixFloatingPoints.jl" rel="nofollow">https://github.com/jiahao/ArbRadixFloatingPoints.jl</a>
caveat: i haven't finished reading the entire FB announcement yet.<p>google announced something along these lines at their AI conference last september and released the video today on youtube. here's the link to the segment where their approach is discussed:
<a href="https://www.youtube.com/watch?v=ot4RWfGTtOg&t=330s" rel="nofollow">https://www.youtube.com/watch?v=ot4RWfGTtOg&t=330s</a>
> Significands are fixed point, and fixed point adders, multipliers, and dividers on these are needed for arithmetic operations... Hardware multipliers and dividers are usually much more resource-intensive<p>It's been a number of years since I've implemented low-level arithmetic, but when you use fixed point, don't you usually choose a power of 2? I don't see why you'd need multiplication/division instead of bit shifters.