TechEcho

11 comments

I'm enthusiastic about BitNet and the potential of low-bit LLMs - the papers show impressive perplexity scores matching full-precision models while drastically reducing compute and memory requirements. What's puzzling is we're not seeing any major providers announce plans to leverage this for their flagship models, despite the clear efficiency gains that could theoretically enable much larger architectures. I suspect there might be some hidden engineering challenges around specialized hardware requirements or training stability that aren't fully captured in the academic results, but would love insights from anyone closer to production deployment of these techniques.

评论 #41880200 未加载

评论 #41879903 未加载

评论 #41880375 未加载

评论 #41881230 未加载

评论 #41882202 未加载

评论 #41881054 未加载

zamadatix7 months ago

For anyone that hasn't read the previous papers before the "1.58-bit" part comes from using 3 values (-1, 0, 1) and log2[3]=1.58...

trebligdivad7 months ago

Has some one made an FPGA or ASIC implementation yet? It <i>feels</i> like it should be easy (and people would snap up for inference).

alkh7 months ago

Sorry for a stupid question but to clarify, even though it is a 1-bit model, it is supposed to be working with any types of embeddings, even taken from larger LLMs(in their example, they use HF1BitLLM/Llama3-8B-1.58-100B-tokens). I.e. it doesn't have an embedding layer built-in and relies on embedding provided separately?

评论 #41881253 未加载

wwwtyro7 months ago

Can anyone help me understand how this works without special bitnet precision-specific hardware? Is special hardware unnecessary? Maybe it just doesn't reach the full bitnet potential without it? Or maybe it does, with some fancy tricks? Thanks!

评论 #41880204 未加载

评论 #41881707 未加载

评论 #41880283 未加载

faragon7 months ago

I'm glad Microsoft uses Bash in the example, instead of their own Windows shells. As a user I would like having something like "Git Bash" for Windows built in the system, as default shell.

评论 #41880730 未加载

评论 #41883028 未加载

评论 #41886203 未加载

Scene_Cast27 months ago

Neat. Would anyone know where the SDPA kernel equivalent is? I poked around the repo, but only saw some form of quantization code with vectorized intrinsics.

delegate7 months ago

I assume it is not as powerful at some tasks than full sized model, so what would one use this model for ?

sheerun7 months ago

When will AIs learn to no bla bla bla by default

lostmsu7 months ago

No GPU inference support?

评论 #41879986 未加载

ein0p7 months ago

1.58bpw is not “1 bit”.

评论 #41922348 未加载

11 comments

newfocogi7 months ago

评论 #41880200 未加载

评论 #41879903 未加载

评论 #41880375 未加载

评论 #41881230 未加载

评论 #41882202 未加载

评论 #41881054 未加载

zamadatix7 months ago

For anyone that hasn't read the previous papers before the "1.58-bit" part comes from using 3 values (-1, 0, 1) and log2[3]=1.58...

trebligdivad7 months ago

Has some one made an FPGA or ASIC implementation yet? It <i>feels</i> like it should be easy (and people would snap up for inference).

alkh7 months ago

评论 #41881253 未加载

wwwtyro7 months ago

评论 #41880204 未加载

评论 #41881707 未加载

评论 #41880283 未加载

faragon7 months ago

I'm glad Microsoft uses Bash in the example, instead of their own Windows shells. As a user I would like having something like "Git Bash" for Windows built in the system, as default shell.

评论 #41880730 未加载

评论 #41883028 未加载

评论 #41886203 未加载

Scene_Cast27 months ago

Neat. Would anyone know where the SDPA kernel equivalent is? I poked around the repo, but only saw some form of quantization code with vectorized intrinsics.

delegate7 months ago

I assume it is not as powerful at some tasks than full sized model, so what would one use this model for ?

sheerun7 months ago

When will AIs learn to no bla bla bla by default

lostmsu7 months ago

No GPU inference support?

评论 #41879986 未加载

ein0p7 months ago

1.58bpw is not “1 bit”.

评论 #41922348 未加载

Microsoft BitNet: inference framework for 1-bit LLMs

11 comments

Microsoft BitNet: inference framework for 1-bit LLMs

11 comments