TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Microsoft BitNet: inference framework for 1-bit LLMs

173 pointsby galeos7 months ago

11 comments

newfocogi7 months ago
I'm enthusiastic about BitNet and the potential of low-bit LLMs - the papers show impressive perplexity scores matching full-precision models while drastically reducing compute and memory requirements. What's puzzling is we're not seeing any major providers announce plans to leverage this for their flagship models, despite the clear efficiency gains that could theoretically enable much larger architectures. I suspect there might be some hidden engineering challenges around specialized hardware requirements or training stability that aren't fully captured in the academic results, but would love insights from anyone closer to production deployment of these techniques.
评论 #41880200 未加载
评论 #41879903 未加载
评论 #41880375 未加载
评论 #41881230 未加载
评论 #41882202 未加载
评论 #41881054 未加载
zamadatix7 months ago
For anyone that hasn't read the previous papers before the "1.58-bit" part comes from using 3 values (-1, 0, 1) and log2[3]=1.58...
trebligdivad7 months ago
Has some one made an FPGA or ASIC implementation yet? It <i>feels</i> like it should be easy (and people would snap up for inference).
alkh7 months ago
Sorry for a stupid question but to clarify, even though it is a 1-bit model, it is supposed to be working with any types of embeddings, even taken from larger LLMs(in their example, they use HF1BitLLM&#x2F;Llama3-8B-1.58-100B-tokens). I.e. it doesn&#x27;t have an embedding layer built-in and relies on embedding provided separately?
评论 #41881253 未加载
wwwtyro7 months ago
Can anyone help me understand how this works without special bitnet precision-specific hardware? Is special hardware unnecessary? Maybe it just doesn&#x27;t reach the full bitnet potential without it? Or maybe it does, with some fancy tricks? Thanks!
评论 #41880204 未加载
评论 #41881707 未加载
评论 #41880283 未加载
faragon7 months ago
I&#x27;m glad Microsoft uses Bash in the example, instead of their own Windows shells. As a user I would like having something like &quot;Git Bash&quot; for Windows built in the system, as default shell.
评论 #41880730 未加载
评论 #41883028 未加载
评论 #41886203 未加载
Scene_Cast27 months ago
Neat. Would anyone know where the SDPA kernel equivalent is? I poked around the repo, but only saw some form of quantization code with vectorized intrinsics.
delegate7 months ago
I assume it is not as powerful at some tasks than full sized model, so what would one use this model for ?
sheerun7 months ago
When will AIs learn to no bla bla bla by default
lostmsu7 months ago
No GPU inference support?
评论 #41879986 未加载
ein0p7 months ago
1.58bpw is not “1 bit”.
评论 #41922348 未加载