Apple's On-Device and Server Foundation Models

941 pointsby 2bit11 months ago

38 comments

rishabhjain119811 months ago

For people interested in AI research, there's nothing new here.IMO they should do a better job of referencing existing papers and techniques. The way they wrote about "adaptors" can make it seem like it's something novel, but it's actually just re-iterating vanilla LoRA. It was enough to convince one of the top-voted HackerNews comments that this was a "huge development".Benchmarks are nice though.

评论 #40641804 未加载

评论 #40642549 未加载

评论 #40642044 未加载

评论 #40644726 未加载

评论 #40641506 未加载

评论 #40641646 未加载

评论 #40643463 未加载

评论 #40641608 未加载

评论 #40645834 未加载

评论 #40643934 未加载

评论 #40643638 未加载

评论 #40649964 未加载

评论 #40648565 未加载

评论 #40655607 未加载

评论 #40641772 未加载

评论 #40645688 未加载

评论 #40645592 未加载

评论 #40642493 未加载

评论 #40646433 未加载

评论 #40641887 未加载

评论 #40643593 未加载

cube222211 months ago

Halfway down the article contains some great charts with comparisons to other relevant models, like Mistral-7B for the on-device models, and both gpt-3.5 and 4 for the server-side models.They include data about the ratio of which outputs human graders preferred (for server side it’s better than 3.5, worse than 4).BUT, the interesting chart to me is „Human Evaluation of Output Harmfulness” which is much, much ”better„ than the other models. Both on-device and server-side.I wonder if that’s part of wanting to have gpt as the „level 3”. Making their own models much more cautious, and using OpenAI’s models in a way that makes it clear „it was ChatGPT that said this, not us”.Instruction following accuracy seems to be really good as well.

评论 #40640419 未加载

评论 #40642883 未加载

评论 #40641270 未加载

ksec11 months ago

I hope, this could mean Apple will push the baseline of ALL Macs to have higher than 8GB of Memory. While I wish we all get 16GB M4 as baseline. Apple being Apple may only give us 12GB, and charges extra $100 for the 16GB option.It will still be a lot better than 8GB though.

评论 #40641141 未加载

评论 #40643587 未加载

评论 #40645818 未加载

评论 #40641257 未加载

评论 #40643674 未加载

ndgold11 months ago

Absolutely awesome amount of content in these two pages. This was not expected. It is appreciated. I can’t wait to use the server model on a Mac to spin up my own cloud optimized for the Apple stack.

评论 #40641216 未加载

评论 #40641275 未加载

vzaliva11 months ago

I love that they use machinelearning.apple.com not ai.apple.com

评论 #40641177 未加载

评论 #40640589 未加载

评论 #40640586 未加载

评论 #40640971 未加载

w10-111 months ago

I think we as tech people lost the forest for the trees.Apple (unwisely I think) is allowing UI's to just generate responses.The wow-neat! experience will wear off quickly. Then even as a miss rate of 0.1%, there will be thousands - millions - of cringe-worthy examples that sully the Apple brand for quality.It will be impossible to create quality filter good enough, and there will be no way to back these features out of the OS.For targeted use-cases (like coding and editing), this will be useful. But these features may be what finally makes contempt for Apple go mainstream, and that would be a shame.Internally at Apple, they likely discussed how much to limit the rollout and control usage. I think they decided to bake it into API's more to maintain developer mindshare than to keep users happy.The one feature that could flip that script is interacting with Siri/AI in order to get things done. The frustration with knowing what you want but not how or whether it can be done drives a lot of tech angst. If this only meant ordinary people could use their existing phones to their full extent, it would be a huge win.

评论 #40650702 未加载

评论 #40650769 未加载

ra711 months ago

> Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.Interesting that they’re using TPUs for training, in addition to GPUs. Is it both a technical decision (JAX and XLA) and a hedge against Nvidia?

评论 #40640666 未加载

评论 #40641360 未加载

评论 #40642692 未加载

dingclancy11 months ago

It’s interesting that a sub-ChatGPT 3.5 class model can do a lot of things on-device if you marry it with a good platform and feed it personal context. GPT-4o, living on the browser, is not as compelling as a product compared to what Apple Intelligence can do on the iPhone with a less capable model.

评论 #40645948 未加载

miven11 months ago

> For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements.Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization

评论 #40642497 未加载

评论 #40642859 未加载

评论 #40652351 未加载

scosman11 months ago

“We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks.”This is huuuuge. I don’t see announcement of 3rd party training support yet, but I imagine/hope it’s planned.One of the hard things about local+private ML is I don’t want every app I download to need GBs of weights, and don’t want a delay when I open a new app and all the memory swap happens. As an app developer I want the best model that runs on each HW model, not one lowest common denominator model for slowest HW I support. Apple has the chance to make this smooth: great models tuned to each chip, adapters for each use case, new use cases only have a few MB of weights (for a set of current base models), and base models can get better over time (new HW and improved models). Basically app thinning for models.Even if the base models aren’t SOTA to start, the developer experience is great and they can iterate.Server side is so much easier, but look forward to local+private taking over for a lot of use cases.

评论 #40640815 未加载

评论 #40641124 未加载

评论 #40640856 未加载

评论 #40641068 未加载

评论 #40642767 未加载

buildbot11 months ago

3.5B per weight with no quality loss is state of the art - that's an awesome optimization result (a mix of 2b and 4b weights).

评论 #40643540 未加载

htrp11 months ago

> Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.>We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.This kind of sounds like Loras......

评论 #40640050 未加载

评论 #40640405 未加载

评论 #40640153 未加载

epipolar11 months ago

It would be interesting to see how these models impact battery life. I’ve tried a few local LLMs on my iPhone 15 Pro via the PrivateLLM app, and the battery charge plummets just after a few minutes of usage.

评论 #40640152 未加载

评论 #40644412 未加载

评论 #40640038 未加载

评论 #40642199 未加载

Jayakumark11 months ago

The model is not opensource. Also now we are stuck with walled garden for models thats deeply integrated at OS or Browser level. 1. Apple Models not open - so we cannot run Android, also not on Desktop Chrome or Edge. 2. Microsoft Phi3 - Can run inside iOS ,but on Android only as an APP but not on OS level or no supported APIs. Can run on Desktop Edge not chrome. 3. Google GEmini nano - Can only run inside Android and Desktop Chrome not Edge, not on iOS as weights are not open.So we cannot get a similar answer from LLM as its different models, you cannot across ecosystem.

orbital-decay11 months ago

> 2. Represent our users: We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.How do they represent users around the globe authentically while being located in Cupertino, CA? (more of a rhetorical question really)

评论 #40644020 未加载

评论 #40644269 未加载

anshumankmr11 months ago

As someone who has been dabbling with Prompt Engineering and now fine tuning some models (working on a use case where we may have to fine tune one of the Mistral's 7B instruct models), I want to know what kind of skillsets I need to really have so that I can join this team (or a similar team building these sort of things)

superkuh11 months ago

The "Human Evaluation of Output Harmfulness" section confirms what I've perceived: Mistral-7B is the best of the small models in terms of minimizing false positive refusals. With the refusal vector abliteration stuff this is less of an issue but a good base is still important.

GaggiX11 months ago

It would be cool to understand when the system will use one or the other (the ~3 billion on-device model or the bigger one on Apple servers).

评论 #40639963 未加载

评论 #40645994 未加载

Isuckatcode11 months ago

>By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.From a ML noob (me) understanding of this, does this mean that the final matrix is regularly fine tuned instead of fine tuning the main model ? Is this similar to how chatGPT now remembers memory[1] ?[1] <a href="https://help.openai.com/en/articles/8590148-memory-faq" rel="nofollow">https://help.openai.com/en/articles/8590148-memory-faq</a>

评论 #40641338 未加载

评论 #40640881 未加载

Blackstrat11 months ago

I haven't seen anything indicating whether these features can be disabled. I'm not interested in adding a further invasion of privacy to my phone. I don't want some elaborate parlor trick helping me write. I've spent some time with ChatGPT and while it was somewhat novel, I wasn't overly impressed. Much of it was rudimentary and often wrong. And I wasn't overly impressed with some of the code that it generated. Reliance on such tools reminds me of an Asimov SF tale.

评论 #40651636 未加载

rvaish11 months ago

Easel on iMessage has had this experience plus more for a while, including multiplayer, where you can have two people in one scene together with photorealistic imagery: <a href="https://apps.apple.com/us/app/easel-ai/id6448734086" rel="nofollow">https://apps.apple.com/us/app/easel-ai/id6448734086</a>

PHGamer11 months ago

it would have been nice if they allowed you to build your own apple AI system (i refused to redefine apples AI as just AI :-p ) using clusters of mac minis and mac pros. but of course they still want that data for themselves like google does. its secure against everyone but apple and the NSA probably lol.

评论 #40642826 未加载

mFixman11 months ago

Has anybody here improved their day-to-day workflow with any kind of "implicit" generative AI rather than explicitly talking to an LLM?So far all attempts seem to be building an universal Clippy. In my experience, all kinds of forced autocomplete and other suggestions have been worse than useless.

评论 #40644691 未加载

Hugsun11 months ago

The benchmarks are very interesting. Unfortunately, the writing benchmarks seem to be poorly constructed. It looks like there are tasks no model can achieve and others that almost all models pass, i.e. every model gets around 9.0.

TheRoque11 months ago

Why isn't there a comparison with the Llama3 8b in the "benchmarks" ?

评论 #40640712 未加载

评论 #40640537 未加载

评论 #40644436 未加载

评论 #40640688 未加载

评论 #40640255 未加载

wslh11 months ago

Is it me or Apple is really moving fast? I don't think it is easy for a company of this size to concisely put a vision of AI in these short and crazy AI times.BTW, not an Apple fan but an Apple user.

评论 #40640925 未加载

评论 #40640813 未加载

评论 #40642252 未加载

hehdhdjehehegwv11 months ago

The WWDC show got on my nerves with the corpspeak, but this is pretty cool stuff.I’ve been trying to make smaller more efficient models in my own work. I hope Apple publish some actual papers.

评论 #40642420 未加载

revscat11 months ago

> With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.This seems impressive. Is it, really? I don’t know enough about the subject to judge.

评论 #40641076 未加载

shreezus11 months ago

This is great, however Apple needs to be explicit on what it, and what isn't relayed to third party services, and provide the ability to opt-out if desired. It's one thing to run inference on-device, and another to send your data through OpenAI's APIs. The partnership details are not entirely clear to me as a user.

评论 #40640880 未加载

评论 #40640966 未加载

评论 #40641761 未加载

评论 #40640944 未加载

评论 #40640947 未加载

simianparrot11 months ago

I just hope all of this can be toggled off, I don't want it on my devices.

评论 #40646339 未加载

advael11 months ago

I'm disappointed that they make the fundamental claim that their cloud service is private with respect to user inputs passed through it and don't even a little bit talk about how that's accomplished. Even just an explanation of what guarantees they make and how would be much more interesting than explanations of their flavor of RLHF or whatever nonsense. I read the GAZELLE* paper when it came out and wondered what it would look like if a large-scale organization tried to deploy something like it.Of course, Apple will never give adequate details about security mechanisms or privacy guarantees. They are in the business of selling you security as something that must be handled by them and them alone, and that knowing how they do it would somehow be less secure (This is the opposite of how it actually works, but also Apple loves doublespeak, and 1984 allusions have been their brand since at least 1984). I view that, like any claim by a tech company that they are keeping your data secure in any context, as security theater. Vague promises are no promises at all. Put up or shut up.* <a href="https://arxiv.org/pdf/1801.05507" rel="nofollow">https://arxiv.org/pdf/1801.05507</a>

评论 #40640020 未加载

ddxv11 months ago

Will these smaller on device models lead to a crash in GPU prices?

评论 #40640594 未加载

评论 #40640092 未加载

评论 #40640003 未加载

dharma111 months ago

do they mention how big the models are? Last I saw was 3gb - I just bought a 8gb m4 iPad and keep thinking I should have gone for the 16gb one

visarga11 months ago

They use synthetic data in pretraining and teacher models in RLHF, that means they use models trained on copyrighted data to make derivative models, is that sitting ok with copyright owners?

koolala11 months ago

aiPhone

kmeisthax11 months ago

> We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.And, of course, nobody has known to opt-out by blocking AppleBot-Extended until after the announcement where they've already pirated shittons of data.In completely unrelated news, I just trained a new OS development AI on every OS Apple has ever written. Don't worry. There's an opt-out, Apple just needed to know to put these magic words in their installer image years ago. I'm sure Apple legal will be OK with this.

评论 #40640004 未加载

评论 #40641044 未加载

评论 #40639997 未加载

评论 #40640279 未加载

评论 #40640588 未加载

评论 #40640113 未加载

评论 #40640034 未加载

评论 #40640102 未加载

评论 #40639970 未加载

评论 #40640464 未加载

评论 #40640598 未加载

deldelaney11 months ago

I need to resurrect by tiny old Motorola Flip Phone without internet connection. Maybe a phone should be just a phone. I don't need AI in my pants.

ofou11 months ago

Quite interesting this was released right after multiple rants from Elon sparked debates on X."If Apple integrates OpenAI at the OS level, then Apple devices will be banned at my companies. That is an unacceptable security violation."Replying to Tim Cook: "Don’t want it. Either stop this creepy spyware or all Apple devices will be banned from the premises of my companies.""It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river."<a href="https://x.com/elonmusk/status/1800269249912381773" rel="nofollow">https://x.com/elonmusk/status/1800269249912381773</a> <a href="https://x.com/elonmusk/status/1800266437677768765" rel="nofollow">https://x.com/elonmusk/status/1800266437677768765</a> <a href="https://x.com/elonmusk/status/1800265431078551973" rel="nofollow">https://x.com/elonmusk/status/1800265431078551973</a>

评论 #40643656 未加载