Every element of a matrix multiplication is a dot-product of two vectors. The dot-product of two vectors quantifies their similarity -- in fact, we call it "dot-product similarity" in a nearest-neighbors context: If the dot product > 0, the vectors point in similar directions; if < 0, the two vectors point in dissimilar directions; if = 0, the two vectors are orthogonal.<p>It's not too hard to imagine that it might be possible to learn representative K-means clusters of training vectors and then, at run-time, find similar vectors using an efficient hashing scheme and do the equivalent of a table lookup to get the approximate dot-product similarity scores -- without having to perform any multiplications.<p>This isn't quite correct (I'm oversimplifying and ignoring many important details), but I think it's a helpful mental image.<p>Very clever stuff.
Primary author here. Happy to answer questions!<p>Also, feel free to email me at the address in the paper if you're interested in talking about it in more detail. E.g., I've already heard from some hardware folks looking at expanding on this work.
An interesting work, with some to-be-addressed questions:
1.The paper only covers the GEMM part with small-scale experiments(CIFAR-10/100), not covering convolution, not covering GEMM part in more popular network such as Transformer/BERT, etc.
2. It is still an approximating method, meaning potential accuracy loss. So I think this method is less attractive to training acceleration scenario, maybe potentially as a complementing methods for inference acceleration.
3. No results evaluated in GPU with TensorCore equipment. I am a little bit curious, since modern AI accelerator(including NV GPU) all incorporate TensorCore which by-design supports GEMM acceleration, what is the add-on value brought by the approximating method mentioned in this paper.
> MADDNESS<p>I think this name really fits well to the concept. Replacing matrix-multiplication with some hash-lookups! (warning: an overly simplified statement)<p>This is a really interesting application of PQ(product quantization), which itself also requires learning (usually K-means). Paper: <a href="https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf" rel="nofollow">https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_wi...</a><p>Considering that ANN has survived through many approximations (e.g. lower precision, pruning), and that many ANN applications are anyway subjective (e.g. image processing, recommendation), I think this can be highly useful in many cases.
Method 1: computer, use MUL to multiply X and Y.<p>Method 2: computer, remember this number. When I give you X and Y, tell me what that number was. This requires no multiplication! Let us tell SCIENCE about our discovery!
I wonder how much alternative number representations have been studied for use in matrices. Like, storing the log of values instead so that multiplication can be done by just adding. Or something like Zech's logarithm. Or even, take the log of whole matrices (which is a thing apparently [0]), then adding them to compute their multiplication. I wonder if the logarithm of a matrix can be computed using ML.<p>[0]: <a href="https://en.wikipedia.org/wiki/Logarithm_of_a_matrix" rel="nofollow">https://en.wikipedia.org/wiki/Logarithm_of_a_matrix</a>
Lots of discussion here about higher dimensional matrices. The argument is that the dot product is an operation on two arguments, therefore any multidimensional matrix multiplication can be broken down into multiple two dimensional operations. The image offerred is that two vectors define a plane so any two dimensional operation is valid. But how about curved space? Suppose I have two vectorsin curved space-they look like two droopy arrows. How does one calculate this dot product? I guess the the local curvature has to be taken into account. In spherically shaped space, this is a constant, but what about irregularly bent space-like that near an amorphous blob of dark matter? Suppose you are trying to calculate electrostatic potentials or do anything with Maxwell's equations? Currently, we define the curvature of space with gravity or the deflection of light past massive objects. Is there a way to measure the curvature of space locally? Can the curvature of space alter the piezoelectric potential generated by a crystal for example and allow its measurement- I am thinking that a miniscule deformation on a large macroscopic object is multiplied many fold on a property that is distributed over the atomic level and happening in parallel on all the atoms of a crystal.
Clever and logical. It reminds me of when John Carmack used a precomputed table of Sine values for fast lookup in Quake, rather than running the actual function on the CPU.
Shouldn't the solution to fast matrix multiplication be logarithms, similarly to fast scalar multiplication?<p><a href="https://en.wikipedia.org/wiki/Logarithm_of_a_matrix" rel="nofollow">https://en.wikipedia.org/wiki/Logarithm_of_a_matrix</a>
Had this same exact thought as an undergrad like 3 years ago! I kinda gave up due to the massive barrier and difficult financial burdens faced by phd students. This feels nice to know i wasn't crazy.
For some reason they run all their tests single threaded. Seems like parallelism is where all computing hardware is inevitably going. I also wish they had run time comparisons to more recent matrix sketching and multiplication method such as frequent directions, newer FJLT implementations, and RIP matrices based on hashing.