> The ability to run generative AI models like Llama 2 on devices such as smartphones, PCs, <i>VR/AR headsets</i><p>Maybe it's an upcoming feature for the Quest 3?<p>To that end, I've been pretty amazed by how far quantization has come. Some early llama-2 quantizations[0] have gotten down to ~2.8gb, though I haven't tested it to see how it performs yet. Still though, we're now talking about models that can comfortably run on pretty low-end hardware. It will be interesting to see where llama crops up with so many options for inferencing hardware.<p>[0] <a href="https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main" rel="nofollow noreferrer">https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/ma...</a>