Meanwhile Apple is stubbornly insisting on its own ways, and awkwardly silent during this whole AI revolution. I gave up using Siri years ago due to its glaring stupidity compared with Google Assistant and Alexa.<p>While Apple keeps making money from overpriced hardware, the competitors are working on actually being pioneers in AI. It makes me sad to see so much computational power in my iPhone and iPad getting wasted on silly subpar iOS apps.
If you have a modern iPhone you can try running an LLM directly on it today using the MLC iPhone app: <a href="https://mlc.ai/mlc-llm/#iphone" rel="nofollow noreferrer">https://mlc.ai/mlc-llm/#iphone</a><p>It can run Vicuna-7B which is a pretty impressive model.<p>(They have an Android app too but I haven't tried that yet).
> The ability to run generative AI models like Llama 2 on devices such as smartphones, PCs, <i>VR/AR headsets</i><p>Maybe it's an upcoming feature for the Quest 3?<p>To that end, I've been pretty amazed by how far quantization has come. Some early llama-2 quantizations[0] have gotten down to ~2.8gb, though I haven't tested it to see how it performs yet. Still though, we're now talking about models that can comfortably run on pretty low-end hardware. It will be interesting to see where llama crops up with so many options for inferencing hardware.<p>[0] <a href="https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main" rel="nofollow noreferrer">https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/ma...</a>
ChatGPT 3.5 is the base level people expect LLMs to be, it would be 2-3 generation(3-4 years) of hardware before we can reach that. Anything below is just going to get bad reviews
Hmm I wonder what sort of privacy impacts there are in a future of having Meta (or Google et al) AI running on chips on your phone when the parent company has so much info on you and blatantly flaunts privacy laws.
How would it work to get a model that needs 8Gb+ of vram currently into some chiplet form factor? Is there an obvious way of translating this more directly to hardware?