I think almost all the free LLMs (not AI) that you find on hf can 'run on CPUs'.<p>The claim here seems to be that it runs <i>usefully fast</i> on CPU.<p>We're not sure how accurate this claim is, because we don't know how fast this model runs on a GPU, because:<p><pre><code> > Absent from the list of supported chips are GPUs [...]
</code></pre>
And TFA doesn't really quantify anything, just offers:<p><pre><code> > Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size — in some cases, twice the speed — while using a fraction of the memory.
</code></pre>
The model they link to is just over 1GB in size, and there's plenty of existing 1-2GB models that are quite serviceable on even a mildly-modern CPU-only rig.