TechEcho

8 comments

I built it at home this morning and tried it, perhaps my expectations were high but I wasn't terribly impressed. I asked it for a list of ten types of data I might show on a home info display panel. It gave me three. I clarified that I wanted ten, it gave me six. Every request after that just returned the same six things.I know it's not chatGPT4 but I've tried other very small models that run on CPU only and had better results

评论 #43720674 未加载

评论 #43716331 未加载

akoboldfryingabout 1 month ago

They give some description of how their weights are stored: they pack 4 weights into an int8, indicating that their storage format isn't optimal (2 bits per weight instead of the optimal ~1.58 bits). But I don't know enough about LLM internals to know how material this is.Could anyone break down the steps further?

评论 #43718627 未加载

Havocabout 1 month ago

Is there a reason why the 1.58 ones are always aimed at quite small ones? Think I’ve seen an 8B but that’s about it.Is there a technical reason for it or just research convenience ?

评论 #43715453 未加载

评论 #43717231 未加载

galeosabout 1 month ago

You can try out the model in a demo they have setup: <a href="https://bitnet-demo.azurewebsites.net/" rel="nofollow">https://bitnet-demo.azurewebsites.net/</a>

Thoreandanabout 1 month ago

I guess B1FF@BITNET posts are gonna come from an LLM now.Context: <a href="https://web.archive.org/web/20030830105202/http://www.catb.org/esr/jargon/html/B/B1FF.html" rel="nofollow">https://web.archive.org/web/20030830105202/http://www.catb.o...</a>

balazstorokabout 1 month ago

Does someone have a good understanding how 2B models can be useful in production? What tasks are you using them for? I wonder what tasks you can fine-tune them on to produce 95-99% results (if anything).

评论 #43714922 未加载

评论 #43715153 未加载

评论 #43714864 未加载

评论 #43715192 未加载

评论 #43714969 未加载

评论 #43714744 未加载

评论 #43714663 未加载

rbanffyabout 1 month ago

Not to be confused with BITNET<a href="https://en.m.wikipedia.org/wiki/BITNET" rel="nofollow">https://en.m.wikipedia.org/wiki/BITNET</a>

rcMgD2BwE72Fabout 1 month ago

I ask about the last French election and the #1 sentence is:>Marine Le Pen, a prominent figure in France, won the 2017 presidential election despite not championing neoliberalism. Several factors contributed to her success: (…)What data did they train their model on?

8 comments

nopelynopingtonabout 1 month ago

评论 #43720674 未加载

评论 #43716331 未加载

akoboldfryingabout 1 month ago

评论 #43718627 未加载

Havocabout 1 month ago

Is there a reason why the 1.58 ones are always aimed at quite small ones? Think I’ve seen an 8B but that’s about it.Is there a technical reason for it or just research convenience ?

评论 #43715453 未加载

评论 #43717231 未加载

galeosabout 1 month ago

You can try out the model in a demo they have setup: <a href="https://bitnet-demo.azurewebsites.net/" rel="nofollow">https://bitnet-demo.azurewebsites.net/</a>

Thoreandanabout 1 month ago

balazstorokabout 1 month ago

评论 #43714922 未加载

评论 #43715153 未加载

评论 #43714864 未加载

评论 #43715192 未加载

评论 #43714969 未加载

评论 #43714744 未加载

评论 #43714663 未加载

rbanffyabout 1 month ago

Not to be confused with BITNET<a href="https://en.m.wikipedia.org/wiki/BITNET" rel="nofollow">https://en.m.wikipedia.org/wiki/BITNET</a>

rcMgD2BwE72Fabout 1 month ago

BitNet b1.58 2B4T Technical Report

8 comments

BitNet b1.58 2B4T Technical Report

8 comments