TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bolt: Faster matrix and vector operations that run on compressed data

183 pointsby febinalmost 3 years ago

9 comments

ffast-mathalmost 3 years ago
Author here. Ask me anything--happy to answer questions.<p>Also, if you like this kind of work, you might like what I&#x27;ve been building for the past year: Composer [1]. It speeds up neural net training by a lot (e.g., 7x faster for ResNet-50) [2] and, in contrast to Bolt&#x2F;MADDNESS, is polished, documented code you can get working in &lt;5min.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;mosaicml&#x2F;composer" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mosaicml&#x2F;composer</a><p>[2] <a href="https:&#x2F;&#x2F;www.mosaicml.com&#x2F;blog&#x2F;mosaic-resnet" rel="nofollow">https:&#x2F;&#x2F;www.mosaicml.com&#x2F;blog&#x2F;mosaic-resnet</a>
评论 #31796155 未加载
评论 #31796520 未加载
评论 #31796102 未加载
评论 #31796625 未加载
cgreerrunalmost 3 years ago
Maddness is their more recent work and yields 100x speedups: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2106.10860.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2106.10860.pdf</a><p>The code for Maddness is in the same github repo if you search for &quot;Mithral&quot;.<p>SIMD instructions can work wonders in the right context.
评论 #31793113 未加载
Ivalmost 3 years ago
&gt; If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.<p>Space. It can save space.<p>The main limitation of fast ML models nowadays is how much parameters you can load in your GPU memory, and these are usually matrices.<p>200x would allow me to run GPT-3 on my old GTX 1050.<p>Frameworks, please implement this NOW!
Ivalmost 3 years ago
This is actually from a paper published last year:<p><a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;MachineLearning&#x2F;comments&#x2F;pffoo8&#x2F;r_multiplying_matrices_without_multiplying&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;MachineLearning&#x2F;comments&#x2F;pffoo8&#x2F;r_m...</a><p>A few questions:<p>- Do some ML frameworks implement it already? - It promises up to 200x compression, is it reasonable to expect it to allow us to run GPT-3 on smaller mainstream GPUs?
评论 #31796358 未加载
jansanalmost 3 years ago
THis sounds and looks impressive, but this part struck me:<p>&quot;If you ... and can tolerate lossy compression&quot;<p>What does this mean? I wouldn&#x27;t have thought that matrix operations can be lossy. Does anybody know to what extend they are lossy and where this would be acceptable?
评论 #31794233 未加载
评论 #31793247 未加载
评论 #31796173 未加载
raxxorraxoralmost 3 years ago
This looks good. Why do the vectors have to be dense? Just because of overhead&#x2F;speed gain being the lowest? Just asking if you could use it universally for all operations if I don&#x27;t know the density.
评论 #31793174 未加载
bee_rideralmost 3 years ago
I guess the naive approach, if we wanted to do a quick lossy matrix multipy, would be to take the truncated SVD and use that. How does this library compare to the boring strategy, I wonder?
评论 #31796116 未加载
评论 #31795220 未加载
nynxalmost 3 years ago
Wow, this is fascinating. I wonder if hardware could be designed to do this really efficiently.
评论 #31796076 未加载
评论 #31793123 未加载
a-dubalmost 3 years ago
any thoughts on trying to build a sort of vq-blas?
评论 #31796340 未加载