For people interested in AI research, there's nothing new here.<p>IMO they should do a better job of referencing existing papers and techniques. The way they wrote about "adaptors" can make it seem like it's something novel, but it's actually just re-iterating vanilla LoRA. It was enough to convince one of the top-voted HackerNews comments that this was a "huge development".<p>Benchmarks are nice though.
Halfway down the article contains some great charts with comparisons to other relevant models, like Mistral-7B for the on-device models, and both gpt-3.5 and 4 for the server-side models.<p>They include data about the ratio of which outputs human graders preferred (for server side it’s better than 3.5, worse than 4).<p>BUT, the interesting chart to me is „Human Evaluation of Output Harmfulness” which is much, much ”better„ than the other models. Both on-device and server-side.<p>I wonder if that’s part of wanting to have gpt as the „level 3”. Making their own models much more cautious, and using OpenAI’s models in a way that makes it clear „it was ChatGPT that said this, not us”.<p>Instruction following accuracy seems to be really good as well.
I hope, this could mean Apple will push the baseline of ALL Macs to have higher than 8GB of Memory. While I wish we all get 16GB M4 as baseline. Apple being Apple may only give us 12GB, and charges extra $100 for the 16GB option.<p>It will still be a lot better than 8GB though.
Absolutely awesome amount of content in these two pages. This was not expected. It is appreciated. I can’t wait to use the server model on a Mac to spin up my own cloud optimized for the Apple stack.
I think we as tech people lost the forest for the trees.<p>Apple (unwisely I think) is allowing UI's to just generate responses.<p>The wow-neat! experience will wear off quickly. Then even as a miss rate of 0.1%, there will be thousands - millions - of cringe-worthy examples that sully the Apple brand for quality.<p>It will be impossible to create quality filter good enough, and there will be no way to back these features out of the OS.<p>For targeted use-cases (like coding and editing), this will be useful. But these features may be what finally makes contempt for Apple go mainstream, and that would be a shame.<p>Internally at Apple, they likely discussed how much to limit the rollout and control usage. I think they decided to bake it into API's more to maintain developer mindshare than to keep users happy.<p>The one feature that could flip that script is interacting with Siri/AI in order to get things done. The frustration with knowing what you want but not how or whether it can be done drives a lot of tech angst. If this only meant ordinary people could use their existing phones to their full extent, it would be a huge win.
> <i>Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.</i><p>Interesting that they’re using TPUs for training, in addition to GPUs. Is it both a technical decision (JAX and XLA) and a hedge against Nvidia?
It’s interesting that a sub-ChatGPT 3.5 class model can do a lot of things on-device if you marry it with a good platform and feed it personal context. GPT-4o, living on the browser, is not as compelling as a product compared to what Apple Intelligence can do on the iPhone with a less capable model.
> For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements.<p>Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization
“We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks.”<p>This is huuuuge. I don’t see announcement of 3rd party training support yet, but I imagine/hope it’s planned.<p>One of the hard things about local+private ML is I don’t want every app I download to need GBs of weights, and don’t want a delay when I open a new app and all the memory swap happens. As an app developer I want the best model that runs on each HW model, not one lowest common denominator model for slowest HW I support. Apple has the chance to make this smooth: great models tuned to each chip, adapters for each use case, new use cases only have a few MB of weights (for a set of current base models), and base models can get better over time (new HW and improved models). Basically app thinning for models.<p>Even if the base models aren’t SOTA to start, the developer experience is great and they can iterate.<p>Server side is so much easier, but look forward to local+private taking over for a lot of use cases.
> Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.<p>>We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.<p>This kind of sounds like Loras......
It would be interesting to see how these models impact battery life. I’ve tried a few local LLMs on my iPhone 15 Pro via the PrivateLLM app, and the battery charge plummets just after a few minutes of usage.
The model is not opensource. Also now we are stuck with walled garden for models thats deeply integrated at OS or Browser level.
1. Apple Models not open - so we cannot run Android, also not on Desktop Chrome or Edge.
2. Microsoft Phi3 - Can run inside iOS ,but on Android only as an APP but not on OS level or no supported APIs. Can run on Desktop Edge not chrome.
3. Google GEmini nano - Can only run inside Android and Desktop Chrome not Edge, not on iOS as weights are not open.<p>So we cannot get a similar answer from LLM as its different models, you cannot across ecosystem.
<i>> 2. Represent our users: We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.</i><p>How do they represent users around the globe authentically while being located in Cupertino, CA? (more of a rhetorical question really)
As someone who has been dabbling with Prompt Engineering and now fine tuning some models (working on a use case where we may have to fine tune one of the Mistral's 7B instruct models), I want to know what kind of skillsets I need to really have so that I can join this team (or a similar team building these sort of things)
The "Human Evaluation of Output Harmfulness" section confirms what I've perceived: Mistral-7B is the best of the small models in terms of minimizing false positive refusals. With the refusal vector abliteration stuff this is less of an issue but a good base is still important.
>By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.<p>From a ML noob (me) understanding of this, does this mean that the final matrix is regularly fine tuned instead of fine tuning the main model ? Is this similar to how chatGPT now remembers memory[1] ?<p>[1] <a href="https://help.openai.com/en/articles/8590148-memory-faq" rel="nofollow">https://help.openai.com/en/articles/8590148-memory-faq</a>
I haven't seen anything indicating whether these features can be disabled. I'm not interested in adding a further invasion of privacy to my phone. I don't want some elaborate parlor trick helping me write. I've spent some time with ChatGPT and while it was somewhat novel, I wasn't overly impressed. Much of it was rudimentary and often wrong. And I wasn't overly impressed with some of the code that it generated. Reliance on such tools reminds me of an Asimov SF tale.
Easel on iMessage has had this experience plus more for a while, including multiplayer, where you can have two people in one scene together with photorealistic imagery: <a href="https://apps.apple.com/us/app/easel-ai/id6448734086" rel="nofollow">https://apps.apple.com/us/app/easel-ai/id6448734086</a>
it would have been nice if they allowed you to build your own apple AI system (i refused to redefine apples AI as just AI :-p ) using clusters of mac minis and mac pros.
but of course they still want that data for themselves like google does. its secure against everyone but apple and the NSA probably lol.
Has anybody here improved their day-to-day workflow with any kind of "implicit" generative AI rather than explicitly talking to an LLM?<p>So far all attempts seem to be building an universal Clippy. In my experience, all kinds of forced autocomplete and other suggestions have been worse than useless.
The benchmarks are very interesting. Unfortunately, the writing benchmarks seem to be poorly constructed. It looks like there are tasks no model can achieve and others that almost all models pass, i.e. every model gets around 9.0.
Is it me or Apple is really moving fast? I don't think it is easy for a company of this size to concisely put a vision of AI in these short and crazy AI times.<p>BTW, not an Apple fan but an Apple user.
The WWDC show got on my nerves with the corpspeak, but this is pretty cool stuff.<p>I’ve been trying to make smaller more efficient models in my own work. I hope Apple publish some actual papers.
> With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.<p>This seems impressive. Is it, really? I don’t know enough about the subject to judge.
This is great, however Apple needs to be explicit on what it, and what isn't relayed to third party services, and provide the ability to opt-out if desired. It's one thing to run inference on-device, and another to send your data through OpenAI's APIs. The partnership details are not entirely clear to me as a user.
I'm disappointed that they make the fundamental claim that their cloud service is private with respect to user inputs passed through it and don't even a little bit talk about how that's accomplished. Even just an explanation of what guarantees they make and how would be much more interesting than explanations of their flavor of RLHF or whatever nonsense. I read the GAZELLE* paper when it came out and wondered what it would look like if a large-scale organization tried to deploy something like it.<p>Of course, Apple will never give adequate details about security mechanisms or privacy guarantees. They are in the business of selling you security as something that must be handled by them and them alone, and that knowing how they do it would somehow be less secure (This is the opposite of how it actually works, but also Apple loves doublespeak, and 1984 allusions have been their brand since at least 1984). I view that, like any claim by a tech company that they are keeping your data secure in any context, as security theater. Vague promises are no promises at all. Put up or shut up.<p>* <a href="https://arxiv.org/pdf/1801.05507" rel="nofollow">https://arxiv.org/pdf/1801.05507</a>
They use synthetic data in pretraining and teacher models in RLHF, that means they use models trained on copyrighted data to make derivative models, is that sitting ok with copyright owners?
> We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.<p>And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.<p>In completely unrelated news, I just trained a new OS development AI on every OS Apple has ever written. Don't worry. There's an opt-out, Apple just needed to know to put these magic words in their installer image years ago. I'm sure Apple legal will be OK with this.
I need to resurrect by tiny old Motorola Flip Phone without internet connection.
Maybe a phone should be just a phone.
I don't need AI in my pants.
Quite interesting this was released right after multiple rants from Elon sparked debates on X.<p>"If Apple integrates OpenAI at the OS level, then Apple devices will be banned at my companies. That is an unacceptable security violation."<p>Replying to Tim Cook:
"Don’t want it.
Either stop this creepy spyware or all Apple devices will be banned from the premises of my companies."<p>"It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!<p>Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river."<p><a href="https://x.com/elonmusk/status/1800269249912381773" rel="nofollow">https://x.com/elonmusk/status/1800269249912381773</a>
<a href="https://x.com/elonmusk/status/1800266437677768765" rel="nofollow">https://x.com/elonmusk/status/1800266437677768765</a>
<a href="https://x.com/elonmusk/status/1800265431078551973" rel="nofollow">https://x.com/elonmusk/status/1800265431078551973</a>