TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Making floating point math highly efficient for AI hardware

152 pointsby probdistover 6 years ago

8 comments

grandmczebover 6 years ago
Here&#x27;s the bottom line for anyone who doesn&#x27;t want to read the whole article.<p>&gt; Using a commercially available 28-nanometer ASIC process technology, we have profiled (8, 1, 5, 5, 7) log ELMA as 0.96x the power of int8&#x2F;32 multiply-add for a standalone processing element (PE).<p>&gt; Extended to 16 bits this method uses 0.59x the power and 0.68x the area of IEEE 754 half-precision FMA<p>In other words, interesting but not earth shattering. Great to see people working in this area though!
评论 #18419784 未加载
评论 #18420160 未加载
评论 #18420772 未加载
评论 #18419741 未加载
评论 #18421448 未加载
moflomeover 6 years ago
Not sure why this isn&#x27;t getting more votes, but it&#x27;s a good avenue of research and the authors should be commended. That said, this approach to optimizing floating point implementations has a lot of history at Imagination Technologies, ARM and similar low-power inferencing chipsets providers. I especially like the Synopsys ASIP Design [0] tool which leverages the open-source (although not yet IEEE ratified) LISA 2.0 Architecture Design Language [1] to iterate on these design issues.<p>Interesting times...<p>[0] <a href="https:&#x2F;&#x2F;www.synopsys.com&#x2F;dw&#x2F;ipdir.php?ds=asip-designer" rel="nofollow">https:&#x2F;&#x2F;www.synopsys.com&#x2F;dw&#x2F;ipdir.php?ds=asip-designer</a> [1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;LISA_(Language_for_Instruction_Set_Architecture)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;LISA_(Language_for_Instruction...</a>
Geeeover 6 years ago
A bit off-topic, but I remember some studies about &#x27;under-powered&#x27; ASICs, ie. running with &#x27;lower-than-required&#x27; voltage and just letting the chip fail sometimes. I guess the outcome was that you can run with 0.1x power and get 0.9x of correctness. Usually chips are designed so that they never fail and that requires using substantially more energy than is needed in the average case. If the application is probabilistic or noisy in general, additional &#x27;computation noise&#x27; could be allowed for better energy efficiency.
评论 #18422288 未加载
dnauticsover 6 years ago
Wow! It&#x27;s kind of a wierd feeling to see some research I worked on get some traction in the real world!! The ELMA lookup problem for 32 bit could be fixed by using the posit standard, which just has &quot;simple&quot; adders for the section past the golomb encoded section, though you may have to worry about spending transistors on the barrel shifter.
评论 #18419759 未加载
sgt101over 6 years ago
For those interested the general area I saw a good talk about representing and manipulating floating point numbers in Julia at CSAIL last week by Jiahao Chen. The code with some good documentation is on his github.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;jiahao&#x2F;ArbRadixFloatingPoints.jl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jiahao&#x2F;ArbRadixFloatingPoints.jl</a>
davmarover 6 years ago
caveat: i haven&#x27;t finished reading the entire FB announcement yet.<p>google announced something along these lines at their AI conference last september and released the video today on youtube. here&#x27;s the link to the segment where their approach is discussed: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ot4RWfGTtOg&amp;t=330s" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ot4RWfGTtOg&amp;t=330s</a>
moltensyntaxover 6 years ago
&gt; Significands are fixed point, and fixed point adders, multipliers, and dividers on these are needed for arithmetic operations... Hardware multipliers and dividers are usually much more resource-intensive<p>It&#x27;s been a number of years since I&#x27;ve implemented low-level arithmetic, but when you use fixed point, don&#x27;t you usually choose a power of 2? I don&#x27;t see why you&#x27;d need multiplication&#x2F;division instead of bit shifters.
评论 #18420223 未加载
评论 #18420024 未加载
saagarjhaover 6 years ago
I find it interesting that they were able to find improvements even on hardware that is presumably optimized for IEEE-754 floating point numbers.
评论 #18421795 未加载