Phi-2 and TinyLlama are so so so impressive for being < 3B parameter models. They can run on a phone, and are pretty snappy.<p>Benchmarks: <a href="https://github.com/ggerganov/llama.cpp/discussions/4508">https://github.com/ggerganov/llama.cpp/discussions/4508</a><p>I don't see them taking over general purpose chat/query use cases, but fined tuned to a specific use case and embedded into mobile apps, might be how we see LLMs jump from cool tech demos to something that's present in most products.