Run DeepSeek R1 Dynamic 1.58-bit

767 pointsby noch4 months ago

33 comments

An 80% size reduction is no joke, and the fact that the 1.58-bit version runs on dual H100s at 140 tokens/s is kind of mind-blowing. That said, I’m still skeptical about how practical this really is for most people. Like, yeah, you can run it on 24GB VRAM or even with just 20GB RAM, but "slow" is an understatement—those speeds would make even the most patient person throw their hands up.And then there’s the whole repetition issue. Infinite loops with "Pygame’s Pygame’s Pygame’s" kind of defeats the point of quantization if you ask me. Sure, the authors have fixes like adjusting the KV cache or using min_p, but doesn’t that just patch a symptom rather than solve the actual problem? A fried model is still fried, even if it stops repeating itself.On the flip side, I love that they’re making this accessible on Hugging Face... and the dynamic quantization approach is pretty brilliant. Using 1.58-bit for MoEs and leaving sensitive layers like down_proj at higher precision—super clever. Feels like they’re squeezing every last drop of juice out of the architecture, which is awesome for smaller teams who can’t afford OpenAI-scale hardware."accessible" still comes with an asterisk. Like, I get that shared memory architectures like a 192GB Mac Ultra are a big deal, but who’s dropping $6,000+ on that setup? For that price, I’d rather build a rig with used 3090s and get way more bang for my buck (though, yeah, it’d be a power hog). Cool tech—no doubt—but the practicality is still up for debate. Guess we'll see if the next-gen models can address some of these trade-offs.

评论 #42851152 未加载

评论 #42851478 未加载

评论 #42852568 未加载

评论 #42851723 未加载

评论 #42851389 未加载

评论 #42851203 未加载

评论 #42853348 未加载

评论 #42851616 未加载

评论 #42852423 未加载

评论 #42854713 未加载

评论 #42858609 未加载

apples_oranges4 months ago

Random observation 1: I was running DeepSeek yesterday on my Linux with a RTX 4090 and I noticed that the models should fit into VRAM, which is 24GB. Or they are simply slow. So the Apple shared memory architecture has an advantage here. A 192GB Mx Ultra can load and process large models efficiently.Random observation 2: It's time to cancel the OpenAI subscription.

评论 #42851675 未加载

评论 #42851387 未加载

评论 #42850812 未加载

评论 #42850873 未加载

评论 #42850869 未加载

评论 #42850872 未加载

评论 #42851017 未加载

评论 #42858723 未加载

mtrovo4 months ago

Wow, an 80% reduction in size for DeepSeek-R1 is just amazing! It's fantastic to see such large models becoming more accessible to those of us who don't have access to top-tier hardware. This kind of optimization opens up so many possibilities for experimenting at home.I'm impressed by the 140 tokens per second speed with the 1.58-bit quantization running on dual H100s. That kind of performance makes the model practical for small or mid sized shops to use it for local applications. This is a huge win for people working on agents that require low latency that only local models could support.

评论 #42853002 未加载

评论 #42851271 未加载

raghavbali4 months ago

> Unfortunately if you naively quantize all layers to 1.58bit, you will get infinite repetitions in seed 3407: “Colours with dark Colours with dark Colours with dark Colours with dark Colours with dark” or in seed 3408: “Set up the Pygame's Pygame display with a Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's Pygame's”.This is really interesting insight (although other works cover this as well). I am particularly amused by the process by which the authors of this blog post arrived at these particular seeds. Good work nonetheless!

评论 #42850888 未加载

评论 #42850626 未加载

brap4 months ago

As someone who is out of the loop, what’s the verdict on R1? Was anyone able to reproduce the results yet? Is the claim that it only took $5M to train generally accepted?It’s a very bold claim which is really shaking up the markets, so I can’t help but wonder if it was even verified at this point.

评论 #42851073 未加载

评论 #42851252 未加载

评论 #42857799 未加载

评论 #42851566 未加载

评论 #42853213 未加载

评论 #42851424 未加载

评论 #42853398 未加载

DogRunner4 months ago

>For optimal performance, we recommend the sum of VRAM + RAM to be at least 80GB+.Oh nice! So I can try it in my local "low power/low cost" server at home.My homesystem does run in a ryzen 5500 + 64gb RAM + 7x RTX 3060 12gbSo 64gb RAM plus 84gb VRAMI dont want to brag around, but point to solutions for us tinkerers with a small budget and high energy costs.such system can be build for around 1600 euro. The power consumption is around 520 watt.I started with a AM4 Board (b450 Chipset) and one used RTX 3060 12gb which cost around 200 Euro used if you are patient.There every additional GPU is connected with the pcie riser/extender to give the cards enough space.After a while I had replaces the pcie cards with a single pcie x4 to 6x PCIe x1 extender.It runs pretty nice. Awesome to learn and gain experience

评论 #42857998 未加载

cubefox4 months ago

For anyone wondering why "1.58" bits: 2^1.58496... = 3. The weights have one of the three states {-1, 0, 1}.

评论 #42851637 未加载

tarruda4 months ago

Would be great if the next generation of base models was designed to be inferred with 128GB of VRAM while 8bit quantized (which would fit in the consumer hardware class).For example, I imagine a strong MoE base with 16 billion active parameters and 6 or 7 experts would keep a good performance while being possible to run on 128GB RAM macbooks.

评论 #42850958 未加载

评论 #42850520 未加载

TheTaytay4 months ago

Danielhanchen, your work is continually impressive. Unsloth is great, and I’m repeatedly amazed at your ability to get up to speed on a new model within hours of its release, and often fix bugs in the default implementation. At this point, I think serious labs should give you a few hour head start just to iron out their kinks!

评论 #42858352 未加载

afro884 months ago

The size reduction while keeping the model coherent is incredible. But I'm skeptical of how much effectiveness was retained. Flappy bird is well known and the kind of thing a non-reasoning model could het right. A better test would be something off the beaten path that R1 and o1 get right that other models don't.

评论 #42853416 未加载

hendersoon4 months ago

The size reduction is impressive but unless I missed it, they don't list any standard benchmarks for comparison so we have no way to tell how it compares to the full-size model.

amusingimpala754 months ago

> DeepSeek-R1 has been making waves recently by rivaling OpenAI's O1 reasoning model while being fully open-source.Do we finally have a model with access to the training architecture and training data set, or are we still calling non-reproducible binary blobs without source form open-source?

评论 #42851964 未加载

miohtama4 months ago

Flappy Bird in Python is the new Turing test

评论 #42850968 未加载

评论 #42853665 未加载

ThePhysicist4 months ago

In general, how do you run these big models on cloud hardware? Do you cut them up layer-wise and run slices of layers on individual A100/H100s?

评论 #42850656 未加载

评论 #42850994 未加载

评论 #42850592 未加载

评论 #42850555 未加载

ggm4 months ago

If I invested in a 100x machine because I needed 100 of x to run, and somebody shows how 10x can work, why have I not just become the holder of 10 10x machines, and therefore have already achieved capex to exploit this new market?I cannot understand why "openai is dead" has legs: repurpose the hardware and data and it can be multiple instances of the more efficient model.

评论 #42860009 未加载

xiphias24 months ago

Has it been tried on 128GB M4 MacBook Pro? I'm gonna try it, but I guess it will be too slow to be usable.I love the original DeepSeek model, but the distilled versions are too dumb usually. I'm excited to try my own queries on it.

评论 #42857312 未加载

评论 #42855134 未加载

Pxtl4 months ago

Is there any good quick summary of what's special about DeepSeek? I know it's OSS and incredibly efficient, but news laymen are saying it's trained purely on AI info instead of using a corpus of tagged data... which, I assume, means it's somehow extracting weights or metadata or something from other AIs. Is that it?

评论 #42857159 未加载

Dwedit4 months ago

Is this actually 1.58 bits? (Log base 2 of 3) I heard of another "1.58 bit" model that actually used 2 bits instead. "1.6 bit" is easy enough, you can pack five 3-state values into a byte by using values 0-242. Then unpacking is easy, you divide and modulo by 3 up to five times (or use a lookup table).

danesparza4 months ago

Just ask it about Taiwan (not kidding). I'm not sure I can trust a model that has such a focused political agenda.

MyFirstSass4 months ago

Is this akin to the quants already being done to various models when you download a GGUF at 4 bits for example, or is this variable layer compression something new that can also be make existing smaller models smaller so we can fit more into say 12 or 16 gb's of vram?

beernet4 months ago

Big fan of unsloth, they have huge potential, could definitely need some experienced GTM people though, IMO. The pricing page and messages sent there are really not good.

评论 #42858385 未加载

slewis4 months ago

It would be really useful to see these evaluated across some of the same evals that the original R1 and deepseek's distills were evaluated on.

patleeman4 months ago

Incredible work by the Unsloth brothers again. It’s really cool to see bitnet quantization implemented like this.

CHB04030854824 months ago

DeepSeek R1 in a nutshellyoutube.com/watch?v=Nl7aCUsWykg

upghost4 months ago

Thanks for the run instructions, unsloth. Deepseek is so new it's been breaking most of my builds.

评论 #42851168 未加载

评论 #42850970 未加载

indigodaddy4 months ago

Is there any small DS or qwen model that could run on say an M4 Mac Mini Standard (16G) ?

techwiz1374 months ago

How can you have a bit and a half exactly? It doesn't make sense.

评论 #42851929 未加载

mclau1564 months ago

Is the new LLM benchmark to create flappy bird in pygame?

CodeCompost4 months ago

Can I run this on ollama?

评论 #42851134 未加载

homarp4 months ago

see also <a href="https://news.ycombinator.com/item?id=42846588">https://news.ycombinator.com/item?id=42846588</a>

petesergeant4 months ago

It is going to be truly fucking revolutionary if open-source models are and continue to be able to challenge the state of the art. My big philosophical concern is that AI locks Capital into an absolutely supreme and insurmountable lead over Labour, and into the hands of oligarchs, and the possibility of a future where that's not case feels amazing. It pleases me greatly that this has Trump riled up too, because I think it means he's much less likely to allow existing US model-makers to build moats, as I think he's -- even as a man who I don't think believes in very much -- absolutely unwilling to let the Chinese get the drop on him over this.

评论 #42851835 未加载

sylware4 months ago

site is javascript walled80%? On 2 H100 only? To get near chatgpt 4? Seriously? The 671B version??

评论 #42853465 未加载

评论 #42852070 未加载

bluesounddirect4 months ago

Hi small comment, please remember in china many things are sponsored by or subsidized by the government. "We[china] can do it for less.." , "it's cheaper in china.." only means the government gave us a pile of cash and help to get here .I 100% expect some downvotes from the ccp.

评论 #42853443 未加载

评论 #42852816 未加载

评论 #42957712 未加载

评论 #42879659 未加载

评论 #42854808 未加载