科技回声

10 条评论

The most significant bits of a floating point representation are basically a logarithm of the number. Logarithms have this relation between multiplication and addition.

评论 #43038477 未加载

评论 #43036094 未加载

评论 #43036667 未加载

评论 #43042062 未加载

评论 #43034943 未加载

sethaurus3 个月前

This is the core trick in the Fast Inverse Square Root[0], as made famous by the Quake III source code.It uses a shift (equivalent to dividing) and a subtraction in integer-land to estimate x^(-0.5) in float-land.[0]: <a href="https://en.m.wikipedia.org/wiki/Fast_inverse_square_root" rel="nofollow">https://en.m.wikipedia.org/wiki/Fast_inverse_square_root</a>

评论 #43035725 未加载

Animats3 个月前

Neat. Of course it works for the exponent, but that it's not that far off for the mantissa is unexpected. It helps that the mantissa is normalized.

评论 #43034230 未加载

SushiHippie3 个月前

Previous discussion of the paper:Addition is all you need for energy-efficient language models - <a href="https://news.ycombinator.com/item?id=41784591">https://news.ycombinator.com/item?id=41784591</a> - Oct 2024 - 126 comments

rowanG0773 个月前

This can be very worth it in circuit design for custom accelerators. Floating point operation is often multicycle. If you can get close enough with addition it will save a ton of resources and probably also simplify the design.

nabla93 个月前

It's math and not just for integers. That's also how slide rule works.a×b = 10^(log(a×b))log(a×b) = log(a)+log(b)thus a×b = 10^(log(a)+log(b))

评论 #43034272 未加载

评论 #43038485 未加载

zeckalpha3 个月前

> The paper talks about using this to save power, but that’s probably not worth it for a few reasonsIt's probably not worth it to do this in software. But in hardware, it might be!A similar approach with a hardware prototype. <a href="https://research.nvidia.com/publication/2022-12_lns-madam-low-precision-training-logarithmic-number-system-using-multiplicative" rel="nofollow">https://research.nvidia.com/publication/2022-12_lns-madam-lo...</a>

WhitneyLand3 个月前

I wonder if there was any inspiration from Quake III.Didn’t the inverse square root trick rely on bit-level floating point and subtraction of a bias?

akomtu3 个月前

float32 (1+m)×2^e as int32 = e<<23 | m(1+m1)×2^e1 × (1+m2)×2^e2 = (1+m1+m2+m1×m2)×2^(e1+e2)If m1×m2 is small, that's approximately float32(m1+m2, e1+e2).

fatuna3 个月前

Would subtraction also approximate division?

评论 #43034636 未加载

10 条评论

HPsquared3 个月前

The most significant bits of a floating point representation are basically a logarithm of the number. Logarithms have this relation between multiplication and addition.

评论 #43038477 未加载

评论 #43036094 未加载

评论 #43036667 未加载

评论 #43042062 未加载

评论 #43034943 未加载

sethaurus3 个月前

评论 #43035725 未加载

Animats3 个月前

Neat. Of course it works for the exponent, but that it's not that far off for the mantissa is unexpected. It helps that the mantissa is normalized.

评论 #43034230 未加载

SushiHippie3 个月前

rowanG0773 个月前

nabla93 个月前

It's math and not just for integers. That's also how slide rule works.a×b = 10^(log(a×b))log(a×b) = log(a)+log(b)thus a×b = 10^(log(a)+log(b))

评论 #43034272 未加载

评论 #43038485 未加载

zeckalpha3 个月前

WhitneyLand3 个月前

I wonder if there was any inspiration from Quake III.Didn’t the inverse square root trick rely on bit-level floating point and subtraction of a bias?

akomtu3 个月前

float32 (1+m)×2^e as int32 = e<<23 | m(1+m1)×2^e1 × (1+m2)×2^e2 = (1+m1+m2+m1×m2)×2^(e1+e2)If m1×m2 is small, that's approximately float32(m1+m2, e1+e2).

fatuna3 个月前

Would subtraction also approximate division?

评论 #43034636 未加载

Why Does Integer Addition Approximate Float Multiplication?

10 条评论

Why Does Integer Addition Approximate Float Multiplication?

10 条评论