科技回声

12 条评论

hu328 天前

Repo with demo video and benchmark:<a href="https://github.com/microsoft/BitNet">https://github.com/microsoft/BitNet</a>"...It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption..."<a href="https://arxiv.org/abs/2402.17764" rel="nofollow">https://arxiv.org/abs/2402.17764</a>

评论 #43713220 未加载

评论 #43720747 未加载

ilrwbwrkhv28 天前

This will happen more and more. This is why NVidia is rushing to get CUDA a software level lock-in otherwise their stock will go the way of Zoom.

评论 #43711655 未加载

评论 #43717903 未加载

评论 #43712819 未加载

评论 #43712890 未加载

zamadatix28 天前

"Parameter count" is the "GHz" of AI models: the number you're most likely to see but least likely to need. All of the models compared (in the table on the huggingface link) are 1-2 billion parameters but the models range in actual size by more than a factor of 10.

评论 #43713446 未加载

评论 #43713454 未加载

Jedd28 天前

I think almost all the free LLMs (not AI) that you find on hf can 'run on CPUs'.The claim here seems to be that it runs usefully fast on CPU.We're not sure how accurate this claim is, because we don't know how fast this model runs on a GPU, because:<pre><code> > Absent from the list of supported chips are GPUs [...] </code></pre> And TFA doesn't really quantify anything, just offers:<pre><code> > Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size — in some cases, twice the speed — while using a fraction of the memory. </code></pre> The model they link to is just over 1GB in size, and there's plenty of existing 1-2GB models that are quite serviceable on even a mildly-modern CPU-only rig.

评论 #43713217 未加载

评论 #43712761 未加载

ein0p28 天前

This is over a year old. The sky did not come down, everyone didn't switch to this in spite of the "advantages". If you look into why, you'll see that it does, in fact, affect the metrics, and some more than others, and there is no silver bullet.

评论 #43713532 未加载

评论 #43712765 未加载

评论 #43713643 未加载

stogot28 天前

The pricing war will continue to rock bottom

falcor8428 天前

Why do they call it "1-bit" if it uses ternary {-1, 0, 1}? Am I missing something?

评论 #43711453 未加载

评论 #43711416 未加载

评论 #43715034 未加载

评论 #43713340 未加载

评论 #43711704 未加载

nodesocket28 天前

There are projects working on distributed LLMs, such as exo[1]. If they can crack the distributed problem fully and get performance it’s a game changer. Instead of spending insane amounts on Nvidia GPUs, can just deploy commodity clusters of AMD EPYC servers with tons of memory, NVMe disks, and 40G or 100G networking which is vastly less expensive. Goodbye Nvidia AI moat.[1] <a href="https://github.com/exo-explore/exo">https://github.com/exo-explore/exo</a>

评论 #43723023 未加载

justanotheratom28 天前

Super cool. Imagine specialized hardware for running these.

评论 #43711774 未加载

评论 #43711646 未加载

esafak28 天前

Is there a library to distill bigger models into BitNet?

评论 #43712809 未加载

instagraham28 天前

> it’s openly available under an MIT license and can run on CPUs, including Apple’s M2.Weird comparison? The M2 already runs 7 or 13gb LLama and Mistral models with relative ease.The M-series and Macbooks are so ubiquitous that perhaps we're forgetting how weak the average CPU (think i3 or i5) can be.

评论 #43713292 未加载

1970-01-0128 天前

..and eventually the Skynet Funding Bill was passed.

12 条评论

hu328 天前

评论 #43713220 未加载

评论 #43720747 未加载

ilrwbwrkhv28 天前

This will happen more and more. This is why NVidia is rushing to get CUDA a software level lock-in otherwise their stock will go the way of Zoom.

评论 #43711655 未加载

评论 #43717903 未加载

评论 #43712819 未加载

评论 #43712890 未加载

zamadatix28 天前

评论 #43713446 未加载

评论 #43713454 未加载

Jedd28 天前

评论 #43713217 未加载

评论 #43712761 未加载

ein0p28 天前

评论 #43713532 未加载

评论 #43712765 未加载

评论 #43713643 未加载

stogot28 天前

The pricing war will continue to rock bottom

falcor8428 天前

Why do they call it "1-bit" if it uses ternary {-1, 0, 1}? Am I missing something?

评论 #43711453 未加载

评论 #43711416 未加载

评论 #43715034 未加载

评论 #43713340 未加载

评论 #43711704 未加载

nodesocket28 天前

评论 #43723023 未加载

justanotheratom28 天前

Super cool. Imagine specialized hardware for running these.

评论 #43711774 未加载

评论 #43711646 未加载

esafak28 天前

Is there a library to distill bigger models into BitNet?

评论 #43712809 未加载

instagraham28 天前

评论 #43713292 未加载

1970-01-0128 天前

..and eventually the Skynet Funding Bill was passed.

Microsoft researchers developed a hyper-efficient AI model that can run on CPUs

12 条评论

Microsoft researchers developed a hyper-efficient AI model that can run on CPUs

12 条评论