科技回声

9 条评论

ffast-math将近 3 年前

Author here. Ask me anything--happy to answer questions.Also, if you like this kind of work, you might like what I've been building for the past year: Composer [1]. It speeds up neural net training by a lot (e.g., 7x faster for ResNet-50) [2] and, in contrast to Bolt/MADDNESS, is polished, documented code you can get working in <5min.[1] <a href="https://github.com/mosaicml/composer" rel="nofollow">https://github.com/mosaicml/composer</a>[2] <a href="https://www.mosaicml.com/blog/mosaic-resnet" rel="nofollow">https://www.mosaicml.com/blog/mosaic-resnet</a>

评论 #31796155 未加载

评论 #31796520 未加载

评论 #31796102 未加载

评论 #31796625 未加载

cgreerrun将近 3 年前

Maddness is their more recent work and yields 100x speedups: <a href="https://arxiv.org/pdf/2106.10860.pdf" rel="nofollow">https://arxiv.org/pdf/2106.10860.pdf</a>The code for Maddness is in the same github repo if you search for "Mithral".SIMD instructions can work wonders in the right context.

评论 #31793113 未加载

Iv将近 3 年前

> If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.Space. It can save space.The main limitation of fast ML models nowadays is how much parameters you can load in your GPU memory, and these are usually matrices.200x would allow me to run GPT-3 on my old GTX 1050.Frameworks, please implement this NOW!

Iv将近 3 年前

This is actually from a paper published last year:<a href="https://www.reddit.com/r/MachineLearning/comments/pffoo8/r_multiplying_matrices_without_multiplying/" rel="nofollow">https://www.reddit.com/r/MachineLearning/comments/pffoo8/r_m...</a>A few questions:- Do some ML frameworks implement it already? - It promises up to 200x compression, is it reasonable to expect it to allow us to run GPT-3 on smaller mainstream GPUs?

评论 #31796358 未加载

jansan将近 3 年前

THis sounds and looks impressive, but this part struck me:"If you ... and can tolerate lossy compression"What does this mean? I wouldn't have thought that matrix operations can be lossy. Does anybody know to what extend they are lossy and where this would be acceptable?

评论 #31794233 未加载

评论 #31793247 未加载

评论 #31796173 未加载

raxxorraxor将近 3 年前

This looks good. Why do the vectors have to be dense? Just because of overhead/speed gain being the lowest? Just asking if you could use it universally for all operations if I don't know the density.

评论 #31793174 未加载

bee_rider将近 3 年前

I guess the naive approach, if we wanted to do a quick lossy matrix multipy, would be to take the truncated SVD and use that. How does this library compare to the boring strategy, I wonder?

评论 #31796116 未加载

评论 #31795220 未加载

nynx将近 3 年前

Wow, this is fascinating. I wonder if hardware could be designed to do this really efficiently.

评论 #31796076 未加载

评论 #31793123 未加载

a-dub将近 3 年前

any thoughts on trying to build a sort of vq-blas?

评论 #31796340 未加载

9 条评论

ffast-math将近 3 年前

评论 #31796155 未加载

评论 #31796520 未加载

评论 #31796102 未加载

评论 #31796625 未加载

cgreerrun将近 3 年前

评论 #31793113 未加载

Iv将近 3 年前

评论 #31796358 未加载

jansan将近 3 年前

评论 #31794233 未加载

评论 #31793247 未加载

评论 #31796173 未加载

raxxorraxor将近 3 年前

This looks good. Why do the vectors have to be dense? Just because of overhead/speed gain being the lowest? Just asking if you could use it universally for all operations if I don't know the density.

评论 #31793174 未加载

bee_rider将近 3 年前

I guess the naive approach, if we wanted to do a quick lossy matrix multipy, would be to take the truncated SVD and use that. How does this library compare to the boring strategy, I wonder?

评论 #31796116 未加载

评论 #31795220 未加载

nynx将近 3 年前

Wow, this is fascinating. I wonder if hardware could be designed to do this really efficiently.

Bolt: Faster matrix and vector operations that run on compressed data

9 条评论

Bolt: Faster matrix and vector operations that run on compressed data

9 条评论