TechEcho

12 comments

sdenton4about 10 years ago

Awesome writeup.I've occasionally thought that there's a lot of time to be saved in machine learning systems by making the floating point operations a bit more noisy. Use maybe eight bit (or even four bit) numbers for the matrix entries, with the philosophy that the overall degree of freedom in the system is enough to make up for this local rigidity, while you gain a lot in terms of both storage space and evaluation time.Along these lines, I saw a great demo of a robot a while ago which used 256 binary devices for movement. Each of these devices is the diagonal of a quadrilateral; when the device is in it's 'on' state, the diagonal is long, and when it's in the 'off' state the diagonal is short. Chain a lot of these together, and you can get pretty good accuracy aiming for specific points in a two-dimensional space. The demo I saw was a three-dimensional version of the same concept, of course.So likewise, we chain together lots of linear operations to build (say) a deep learning system. The weight space is very high-dimensional, and the decision space is the much lower dimensional classification/regression problem: one probably needs a lot less than 64 bits of accuracy in the weight space to approximate a point in the decision space.

评论 #9517130 未加载

评论 #9517702 未加载

tagrunabout 10 years ago

From a practical point of view, in case you have Mathematica (which is a proprietary software unfortunately), you can do this easily for any function in a general way: <a href="http://reference.wolfram.com/language/FunctionApproximations/tutorial/FunctionApproximations.html" rel="nofollow">http://reference.wolfram.com/language/FunctionApproximations...</a>(MiniMaxApproximation minimizes the relative maximum error)For the case discussed in write up, it givesMiniMaxApproximation[Log2[x + 1], {x, {1, 2}, 2, 0}]log2(x+1) ~ 0.177392 + 0.943626 x - 0.120232 x^2with a maximum error of -0.000786282.In the 4th order (that is, (4,0)), it gives0.0399284 + 1.27027 x - 0.391233 x^2 + 0.0909038 x^3 - 0.00986308 x^4max error: -4.83031 10^-6.However, if the division isn't too expensive, a (2,2) approximant also works for an even better precision:(0.00383323 + 1.42412 x + 0.336161 x^2)/(1 + 0.704317 x + 0.0598015 x^2)max error: -1.33902 10^-7As for [0.75, 1.5) interval, I get0.108715 + 1.05621 x - 0.165315 x^2which gives a max error -0.000651425 and the max relative error is around 0.0006, which is again way better than what he gives in the write up, although the error is non zero at points 0.75 and 1.5 (which is not really a necessary constraint, but an artifact of the particular polynomial he chose). The overall plot for relative error in a wider range is smooth.Bottom line: 1) Unless you are an expert in numerical methods, consult to a software/article/book by an expert (or to the expert him/herself if you can) in the field at some point. 2) While I appreciate his efforts and the write up, I wouldn't recommend using his polynomials in practice.(BTW, one can use frexp instead of bit-twiddling to make the C code friendlier since not everyone is familiar with the bit layout of IEEE 754 numbers.)

评论 #9518832 未加载

评论 #9518377 未加载

minimaxabout 10 years ago

The author of this post, David Goldberg, is the same guy that wrote "What Every Computer Scientist Should Know About Floating-Point Arithmetic."<a href="http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html" rel="nofollow">http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.ht...</a>

评论 #9518990 未加载

malkiaabout 10 years ago

I've learned to be extra careful with fast log approximation. I was the tools programmer at a game studio, and few years ago was given the task to speed things a bit. There was a process that was summing neighbouring pixels, doing some filtering/pre-processing then calculating a value through log, so I've profiled that this was the slowdown, and decided to provide a faster log version. After some googling, and testing it seems everything should've work.Few days later, black mipmaps started to appear. Turns out, I was doing testing in a wrong way, where I was thinking I'm comparing new to old, where I was comparing old to old created by other machine (a bit different results due to be expected). All because I forgot to turn off caching.And my approximation of log2 did not really worked for the most of the numbers involved in. It was using floats, not doubles to begin with, and producing a lot of either NaN's and infinities (hence the black colors).Anyway it was reverted, and had to reconvert everything to get back to normal.

TheLoneWolflingabout 10 years ago

Another interesting idea is ditching the format of a classic polynomial. Multiplication isn't the fastest thing ever.I wonder what would happen if you ran the same minmax scoring scheme through an general instruction synthesizer for the same complexity cap?

boulosabout 10 years ago

This is a pretty nice introduction, and makes me want to return to my side project where I used to play with these approximations but for SIMD math: <a href="https://github.com/boulos/syrah/blob/master/src/include/syrah/FixedVectorMath.h" rel="nofollow">https://github.com/boulos/syrah/blob/master/src/include/syra...</a> .I hope in a later post he mentions Sollya (<a href="http://sollya.gforge.inria.fr" rel="nofollow">http://sollya.gforge.inria.fr</a>), I found it invaluable for playing around with different precision vs speed.

jgalt212about 10 years ago

Why is Ebay using the log function and why is it used so heavily that it causes a performance bottleneck for them? Is this an inherited code base from their Hunch purchase?

me2youabout 10 years ago

As in Math Toolkit for Real-Time Programming by Jack Crenshaw

rg2004about 10 years ago

If your bottleneck is log() and you're not memory-limited, why not use a lookup table?

评论 #9517192 未加载

评论 #9517420 未加载

brandmeyerabout 10 years ago

Boost used to have a minimax approximation algorithm in its codebase. Unfortunately, nobody knows enough about how it works it to maintain it, and you have to look in much older releases of Boost to get it.

评论 #9519613 未加载

paulpauperabout 10 years ago

But why is this important? Can't computers already do this? Even my 1978 TI pocket calculator can do logs very fast. It's not like this is an important unsolved mathematics puzzle.

评论 #9517231 未加载

Houshalterabout 10 years ago

I tried evolving a good approximation with Eureqa: <a href="https://i.imgur.com/1CTTjYD.png?1" rel="nofollow">https://i.imgur.com/1CTTjYD.png?1</a>

评论 #9517695 未加载

12 comments

sdenton4about 10 years ago

评论 #9517130 未加载

评论 #9517702 未加载

tagrunabout 10 years ago

评论 #9518832 未加载

评论 #9518377 未加载

minimaxabout 10 years ago

评论 #9518990 未加载

malkiaabout 10 years ago

TheLoneWolflingabout 10 years ago

boulosabout 10 years ago

jgalt212about 10 years ago

Why is Ebay using the log function and why is it used so heavily that it causes a performance bottleneck for them? Is this an inherited code base from their Hunch purchase?

me2youabout 10 years ago

As in Math Toolkit for Real-Time Programming by Jack Crenshaw

rg2004about 10 years ago

If your bottleneck is log() and you're not memory-limited, why not use a lookup table?

评论 #9517192 未加载

评论 #9517420 未加载

brandmeyerabout 10 years ago

评论 #9519613 未加载

paulpauperabout 10 years ago

But why is this important? Can't computers already do this? Even my 1978 TI pocket calculator can do logs very fast. It's not like this is an important unsolved mathematics puzzle.

评论 #9517231 未加载

Houshalterabout 10 years ago

I tried evolving a good approximation with Eureqa: <a href="https://i.imgur.com/1CTTjYD.png?1" rel="nofollow">https://i.imgur.com/1CTTjYD.png?1</a>

评论 #9517695 未加载

Fast Approximate Logarithms, Part I: The Basics

12 comments

Fast Approximate Logarithms, Part I: The Basics

12 comments