TechEcho

7 comments

halflingsabout 1 year ago

This looks pretty cool! What's the catch? e.g. why isn't this already implemented in accelerators, is it really just a forgotten algorithm, or this has some implications on the cost of building the accelerator or else?

评论 #39715790 未加载

评论 #39714576 未加载

评论 #39715277 未加载

评论 #39719264 未加载

评论 #39717572 未加载

评论 #39714599 未加载

ixaxaarabout 1 year ago

Man I remembered something similar I had tried working on in 2018, but gave up after all my PhD applications got rejected.<a href="https://github.com/ixaxaar/pytorch-dni">https://github.com/ixaxaar/pytorch-dni</a>The concept here goes a bit further and tries to replicate backprop with an external network, arguing that that's probably what the brain actually does.

评论 #39715105 未加载

评论 #39717641 未加载

评论 #39715951 未加载

michelppabout 1 year ago

This is very cool and a real interesting read! For those in the comments confused about how this is better, the paper is talking about synthesizing matrix multiplication pipelines in hardware, like an FPGA or ASIC. On a CPU or GPU you won't notice because adds and multiplications take the same amount of time generally, but multiplication units takes up many more transistors, so if you can reduce the circuit complexity you can increase the speed and parallel throughput and reduce power and routing complexity. This approach could be particularly useful for efficient sparse matrix multiplication accelerators.Another cool way to eliminate multiplication in matrix multiplication is to use different semirings [1]. The Tropical Semiring [2] for example substitutes addition for multiplication and min (or max) for addition. It's still matrix multiplication but with substituted binary operations. The research in this relatively new field of Tropical Algebra [3] is quite active and rich right now, being used for all kinds of optimization problems and in research for optimizing neural networks [4] . This approach also lends itself to hardware synthesis since most FPGA configurable logical blocks can add/min/max in one clock cycle, whereas efficient multiplication requires fixed dedicated on-chip hardware multipliers.Another way to efficiently remove multiplications with a different but related semiring is to use a Log Semiring [5]. If you have to multiply chains of probabilities (like Markov chains) then the numbers quickly become very small and floating point loses its accuracy to represent the numbers. By scaling the numbers first by taking the log, multiplication becomes addition and addition becomes x + log1p(exp(y - x)).[1] <a href="https://en.wikipedia.org/wiki/Semiring" rel="nofollow">https://en.wikipedia.org/wiki/Semiring</a>[2] <a href="https://en.wikipedia.org/wiki/Tropical_semiring" rel="nofollow">https://en.wikipedia.org/wiki/Tropical_semiring</a>[3] <a href="https://en.wikipedia.org/wiki/Tropical_geometry" rel="nofollow">https://en.wikipedia.org/wiki/Tropical_geometry</a>[4] <a href="https://proceedings.mlr.press/v80/zhang18i/zhang18i.pdf" rel="nofollow">https://proceedings.mlr.press/v80/zhang18i/zhang18i.pdf</a>[5] <a href="https://en.wikipedia.org/wiki/Log_semiring" rel="nofollow">https://en.wikipedia.org/wiki/Log_semiring</a>

评论 #39721305 未加载

评论 #39720332 未加载

评论 #39720475 未加载

评论 #39717424 未加载

评论 #39720758 未加载

评论 #39721563 未加载

Drakimabout 1 year ago

I'm surprised this actually works, usually detecting whether to use multiplication or addition is slower than simply using multiplication. Especially if it's massive amounts of work being done in parallel.

评论 #39738893 未加载

skykoolerabout 1 year ago

I find it fascinating that this is using a process invented in 1968 and hasn't been used for this purpose until now!

评论 #39718704 未加载

Lucasoatoabout 1 year ago

If you're interested in the mathematical theory behind sub-cubic algorithms for matrix multiplications, you can start from here: <a href="https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Sub-cubic_algorithms" rel="nofollow">https://en.wikipedia.org/wiki/Matrix_multiplication_algorith...</a>I conjecture that for every j > 0 in R, a number n exists so that any two n x n matrices can be multiplied together in O(n^(2+j)) steps.(Now proven for for 2+j = w = 2.3728596, or j > 0.3728596)

评论 #39716748 未加载

评论 #39719103 未加载

评论 #39716626 未加载

barfbagginusabout 1 year ago

This readme does a really poor job of explaining what the improvement is or how they drop half the multiplications. What is the Big O run time on this? Is this shifting the known best bounds?And the diagrams are chaotic and don't really explain anything about why this approach is fast or good. The result is that I'm reluctant to even click-through to the PDF.If you want to improve the project credibility please consider being honest and open about what is actually going on and giving some clear explanations and illustrations, rather than things that may as well be designed to hype people too busy to tell you that you are cranks. It's hard to tell if this is incredibly groundbreaking or just but nothingburger. Sadly I almost feel like that must be an intentional decision motivated by poor merits of work and a desire to exploit AI height. The alternative - which I prefer to believe is the case - is that the author simply needs to revise and better contextualize.

评论 #39714991 未加载

评论 #39714778 未加载

评论 #39717316 未加载

评论 #39717343 未加载

评论 #39714541 未加载

7 comments

halflingsabout 1 year ago

评论 #39715790 未加载

评论 #39714576 未加载

评论 #39715277 未加载

评论 #39719264 未加载

评论 #39717572 未加载

评论 #39714599 未加载

ixaxaarabout 1 year ago

评论 #39715105 未加载

评论 #39717641 未加载

评论 #39715951 未加载

michelppabout 1 year ago

评论 #39721305 未加载

评论 #39720332 未加载

评论 #39720475 未加载

评论 #39717424 未加载

评论 #39720758 未加载

评论 #39721563 未加载

Drakimabout 1 year ago

评论 #39738893 未加载

skykoolerabout 1 year ago

I find it fascinating that this is using a process invented in 1968 and hasn't been used for this purpose until now!

Show HN: Matrix Multiplication with Half the Multiplications

7 comments

Show HN: Matrix Multiplication with Half the Multiplications

7 comments