bloomz.cpp allows running inference of BLOOM-like models in pure C/C++ (inspired by llama.cpp). It supports all models that can be loaded in transformers for BLOOM.<p>As an example, you can run GPT-4 on your Mac or Pixel! On M1 Pro, you can achieve 16 tokens/sec.