Multiplying Matrices Without Multiplying

235 pointsby moinnadeemover 3 years ago

17 comments

cs702over 3 years ago

Every element of a matrix multiplication is a dot-product of two vectors. The dot-product of two vectors quantifies their similarity -- in fact, we call it "dot-product similarity" in a nearest-neighbors context: If the dot product > 0, the vectors point in similar directions; if < 0, the two vectors point in dissimilar directions; if = 0, the two vectors are orthogonal.It's not too hard to imagine that it might be possible to learn representative K-means clusters of training vectors and then, at run-time, find similar vectors using an efficient hashing scheme and do the equivalent of a table lookup to get the approximate dot-product similarity scores -- without having to perform any multiplications.This isn't quite correct (I'm oversimplifying and ignoring many important details), but I think it's a helpful mental image.Very clever stuff.

评论 #28376183 未加载

评论 #28377538 未加载

评论 #28383976 未加载

评论 #28381442 未加载

评论 #28376953 未加载

评论 #28377674 未加载

ffast-mathover 3 years ago

Primary author here. Happy to answer questions!Also, feel free to email me at the address in the paper if you're interested in talking about it in more detail. E.g., I've already heard from some hardware folks looking at expanding on this work.

评论 #28380033 未加载

评论 #28492349 未加载

评论 #28383166 未加载

评论 #28380629 未加载

评论 #28386535 未加载

yangjunproover 3 years ago

An interesting work, with some to-be-addressed questions: 1.The paper only covers the GEMM part with small-scale experiments(CIFAR-10/100), not covering convolution, not covering GEMM part in more popular network such as Transformer/BERT, etc. 2. It is still an approximating method, meaning potential accuracy loss. So I think this method is less attractive to training acceleration scenario, maybe potentially as a complementing methods for inference acceleration. 3. No results evaluated in GPU with TensorCore equipment. I am a little bit curious, since modern AI accelerator(including NV GPU) all incorporate TensorCore which by-design supports GEMM acceleration, what is the add-on value brought by the approximating method mentioned in this paper.

评论 #28376480 未加载

评论 #28375935 未加载

评论 #28376495 未加载

esjeonover 3 years ago

> MADDNESSI think this name really fits well to the concept. Replacing matrix-multiplication with some hash-lookups! (warning: an overly simplified statement)This is a really interesting application of PQ(product quantization), which itself also requires learning (usually K-means). Paper: <a href="https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf" rel="nofollow">https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_wi...</a>Considering that ANN has survived through many approximations (e.g. lower precision, pruning), and that many ANN applications are anyway subjective (e.g. image processing, recommendation), I think this can be highly useful in many cases.

评论 #28379964 未加载

TaupeRangerover 3 years ago

Method 1: computer, use MUL to multiply X and Y.Method 2: computer, remember this number. When I give you X and Y, tell me what that number was. This requires no multiplication! Let us tell SCIENCE about our discovery!

评论 #28383810 未加载

评论 #28380372 未加载

评论 #28380314 未加载

infogulchover 3 years ago

I wonder how much alternative number representations have been studied for use in matrices. Like, storing the log of values instead so that multiplication can be done by just adding. Or something like Zech's logarithm. Or even, take the log of whole matrices (which is a thing apparently [0]), then adding them to compute their multiplication. I wonder if the logarithm of a matrix can be computed using ML.[0]: <a href="https://en.wikipedia.org/wiki/Logarithm_of_a_matrix" rel="nofollow">https://en.wikipedia.org/wiki/Logarithm_of_a_matrix</a>

评论 #28376541 未加载

评论 #28379345 未加载

评论 #28377951 未加载

评论 #28376467 未加载

bawanaover 3 years ago

Lots of discussion here about higher dimensional matrices. The argument is that the dot product is an operation on two arguments, therefore any multidimensional matrix multiplication can be broken down into multiple two dimensional operations. The image offerred is that two vectors define a plane so any two dimensional operation is valid. But how about curved space? Suppose I have two vectorsin curved space-they look like two droopy arrows. How does one calculate this dot product? I guess the the local curvature has to be taken into account. In spherically shaped space, this is a constant, but what about irregularly bent space-like that near an amorphous blob of dark matter? Suppose you are trying to calculate electrostatic potentials or do anything with Maxwell's equations? Currently, we define the curvature of space with gravity or the deflection of light past massive objects. Is there a way to measure the curvature of space locally? Can the curvature of space alter the piezoelectric potential generated by a crystal for example and allow its measurement- I am thinking that a miniscule deformation on a large macroscopic object is multiplied many fold on a property that is distributed over the atomic level and happening in parallel on all the atoms of a crystal.

评论 #28383067 未加载

评论 #28383738 未加载

alforover 3 years ago

I wonder how our brain can train billion of neuron without matrix multiplication. What is the biological process that get a similar result?

评论 #28375669 未加载

评论 #28377605 未加载

评论 #28376047 未加载

trilinearnzover 3 years ago

Clever and logical. It reminds me of when John Carmack used a precomputed table of Sine values for fast lookup in Quake, rather than running the actual function on the CPU.

评论 #28378573 未加载

评论 #28377524 未加载

joe_the_userover 3 years ago

Shouldn't the solution to fast matrix multiplication be logarithms, similarly to fast scalar multiplication?<a href="https://en.wikipedia.org/wiki/Logarithm_of_a_matrix" rel="nofollow">https://en.wikipedia.org/wiki/Logarithm_of_a_matrix</a>

评论 #28376114 未加载

nynxover 3 years ago

Huh, that’s kinda fascinating. Maybe it’d be worth running ml on matrix multiplication that’s approximated by ml?

评论 #28375212 未加载

ausbahover 3 years ago

as a total outside to this sort of stuff, doesn't this have the big issue of error propagation?

评论 #28375356 未加载

sriram_malharover 3 years ago

Can someone help me understand how approximate it is? what are te error bounds?

评论 #28400949 未加载

antonzabirkoover 3 years ago

Had this same exact thought as an undergrad like 3 years ago! I kinda gave up due to the massive barrier and difficult financial burdens faced by phd students. This feels nice to know i wasn't crazy.

评论 #28375737 未加载

评论 #28375664 未加载

评论 #28375506 未加载

评论 #28375641 未加载

random314over 3 years ago

Ha ha! What the hell! This is revolutionary if true!

kazinatorover 3 years ago

If machine learning depends on matrix multiplications, but you fib them using machine learning ... do you see the problem?

评论 #28377192 未加载

leecarraherover 3 years ago

For some reason they run all their tests single threaded. Seems like parallelism is where all computing hardware is inevitably going. I also wish they had run time comparisons to more recent matrix sketching and multiplication method such as frequent directions, newer FJLT implementations, and RIP matrices based on hashing.

评论 #28376606 未加载