I built it at home this morning and tried it, perhaps my expectations were high but I wasn't terribly impressed. I asked it for a list of ten types of data I might show on a home info display panel. It gave me three. I clarified that I wanted ten, it gave me six. Every request after that just returned the same six things.<p>I know it's not chatGPT4 but I've tried other very small models that run on CPU only and had better results
They give some description of how their weights are stored: they pack 4 weights into an int8, indicating that their storage format isn't optimal (2 bits per weight instead of the optimal ~1.58 bits). But I don't know enough about LLM internals to know how material this is.<p>Could anyone break down the steps further?
Is there a reason why the 1.58 ones are always aimed at quite small ones? Think I’ve seen an 8B but that’s about it.<p>Is there a technical reason for it or just research convenience ?
You can try out the model in a demo they have setup: <a href="https://bitnet-demo.azurewebsites.net/" rel="nofollow">https://bitnet-demo.azurewebsites.net/</a>
I guess B1FF@BITNET posts are gonna come from an LLM now.<p>Context: <a href="https://web.archive.org/web/20030830105202/http://www.catb.org/esr/jargon/html/B/B1FF.html" rel="nofollow">https://web.archive.org/web/20030830105202/http://www.catb.o...</a>
Does someone have a good understanding how 2B models can be useful in production? What tasks are you using them for? I wonder what tasks you can fine-tune them on to produce 95-99% results (if anything).
Not to be confused with BITNET<p><a href="https://en.m.wikipedia.org/wiki/BITNET" rel="nofollow">https://en.m.wikipedia.org/wiki/BITNET</a>
I ask about the last French election and the #1 sentence is:<p>>Marine Le Pen, a prominent figure in France, won the 2017 presidential election despite not championing neoliberalism. Several factors contributed to her success: (…)<p>What data did they train their model on?