Chrome is adding `window.ai` – a Gemini Nano AI model right inside the browser

229 pointsby modinfo11 months ago

41 comments

onion2k11 months ago

My first impression is that this should enable approximately what Apple is doing with their AI strategy (local on-device first, then filling back to a first party API, and finally something like ChatGPT), but for web users. Having it native in the browser could be really positive for a lot of use cases depending on whether the local version can do things like RAG using locally stored data, and generate structured information like JSON.I don't think this is a terrible idea. LLM-powered apps are here to stay, so browsers making them better is a good thing. Using a local model so queries aren't flying around to random third parties is better for privacy and security. If Google can make this work well it could be really interesting.

评论 #40835125 未加载

评论 #40843249 未加载

评论 #40841907 未加载

评论 #40834888 未加载

评论 #40835392 未加载

lolinder11 months ago

> The code below is all you need to stream text with Chrome AI and the Vercel AI SDK. ... `chromeai` implements a Provider that uses `window.ai` under the hoodLeave it to Vercel to announce `window.ai` on Google's behalf by showing off their own abstraction but not the actual Chrome API.Here's a blog post from a few days ago that shows how the actual `window.ai` API works [0]. The code is extremely simple and really shouldn't need a wrapper:<pre><code> const model = await window.ai.createTextSession(); const result = await model.prompt("What do you think is the meaning of life?"); </code></pre> [0] <a href="https://afficone.com/blog/window-ai-new-chrome-feature-api/" rel="nofollow">https://afficone.com/blog/window-ai-new-chrome-feature-api/</a>

评论 #40835578 未加载

评论 #40835359 未加载

评论 #40835027 未加载

simonw11 months ago

If this is the API that Google are going with here:<pre><code> const model = await window.ai.createTextSession(); const result = await model.prompt("3 names for a pet pelican"); </code></pre> There's a VERY obvious flaw: is there really no way to specify the model to use?Are we expecting that Gemini Nano will be the one true model, forever supported by this API baked into the world's most popular browser?Given the rate at which models are improving that would be ludicrous. But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?Something like this would at least give us a fighting chance:<pre><code> const supportedModels = await window.ai.getSupportedModels(); if (supportedModels.includes("gemini-nano:0.4")) { const model = await window.ai.createTextSession("gemini-nano:0.4"); // ...</code></pre>

评论 #40835757 未加载

评论 #40835678 未加载

评论 #40836197 未加载

评论 #40836971 未加载

评论 #40843533 未加载

评论 #40835717 未加载

评论 #40835703 未加载

flakiness11 months ago

So they don't standardize things anymore?Look at WebNN [1]. It's from Microsoft and is basically DirecttML but they at least pretend to make it a Web thing.The posture matters. Apple tried to expose Metal through WebGPU [2] then silent-abandoned it. But they had the posture, and other vendors picked it up and made it real.That won't happen to window.ai until they stop sleepwalking.[1] <a href="https://www.w3.org/TR/webnn/" rel="nofollow">https://www.w3.org/TR/webnn/</a>[2] <a href="https://www.w3.org/TR/webgpu/" rel="nofollow">https://www.w3.org/TR/webgpu/</a>

评论 #40835599 未加载

评论 #40835127 未加载

btown11 months ago

If we thought websites mining Monera in ads was bad, wait until every site sells its users’ CPU cycles on a gray market for distributed LLM processing!

评论 #40835021 未加载

oefrha11 months ago

See<a href="https://developer.chrome.com/docs/ai/built-in" rel="nofollow">https://developer.chrome.com/docs/ai/built-in</a><a href="https://github.com/jeasonstudio/chrome-ai">https://github.com/jeasonstudio/chrome-ai</a>I can’t seem to find public documentation for the API with a cursory search, so <a href="https://github.com/jeasonstudio/chrome-ai/blob/ec9e334253713e08314bb2dcde3aa124f4f40a68/src/language-model.ts#L75">https://github.com/jeasonstudio/chrome-ai/blob/ec9e334253713...</a> might be the best documentation (other than directly inspecting the window.ai object in console) at the moment.It’s not really clear if the Gemini Nano here is Nano-1 (1.8B) or Nano-2 (3.25B) or selected based on device.

评论 #40834950 未加载

评论 #40835017 未加载

foxfired11 months ago

Today I finally clicked on that "Create theme with AI" on chrome's default page. I'm really having a hard time trying to differentiate it with selecting any random theme.At this point I'm going to create an image generator that's just an api to return random images from pixabay. pix.ai (opensource of course)

评论 #40834810 未加载

SquareWheel11 months ago

So it's loading an instruct model for inference? That seems a fair bit less useful than a base model, at least for more advanced use cases.What about running LoRAs, adjusting temperature, configuring prompt templates, etc? It seems pretty early to build something like this into the browser. The technology is still changing so rapidly, it might look completely different in 5 years.I'm a huge fan of local AI, and of empowering web browsers as a platform, but I'm feeling pretty stumped by this one. Is this a good inclusion at this time? Or is the Chrome team following the Google-wide directive to integrate AI _everywhere_, and we're getting a weird JS API as a result?At the very least, I hope to see the model decoupled from the interface. In the same way that font-family loads locally installed fonts, it should be pluggable for other local models.

评论 #40835585 未加载

nnnnico11 months ago

eval(window.ai("js code to remove all adds in the following page" + document.documentElement.outerHTML))

评论 #40834856 未加载

评论 #40834859 未加载

评论 #40835765 未加载

wonrax11 months ago

Can someone specialized in applied machine learning explain how this is useful? In my opinion, general-purpose models are only useful if they're large, as they are more capable and produce more accurate outputs for certain tasks. For on-device models, fine-tuned ones for specific tasks have greater precision with the same size.

评论 #40837026 未加载

langsoul-com11 months ago

Man, the bloat is unreal. Can't we just have a browser without all the extra crap?

评论 #40835890 未加载

评论 #40836369 未加载

评论 #40836836 未加载

saurik11 months ago

YES!!! Back when Opera was adding a local AI to their browser UI, I had explained how I wanted it to be exposed as an API, as it seems like one of the few ACTUAL good uses for a user agent API: letting me choose which model I am using and where my data is going, rather than the website I am using (which inherently will require standardizing an API surface in the browser websites can use instead of trying to compete for scant memory resources by bringing their own local model or shipping my data off to some remote API).<a href="https://news.ycombinator.com/item?id=39920803">https://news.ycombinator.com/item?id=39920803</a>> So while I am usually the person who would much rather the browser do almost nothing that isn't a hardware interface, requiring all software (including rendering) to be distributed as code by the website via the end-to-end principal--making the browser easy to implement and easy to secure / sandbox, as it is simply too important of an attack surface to have a billion file format parsing algorithms embedded within it--I actually would love (and I realize this isn't what Opera is doing, at least yet) to have the browser provide a way to get access to a user-selected LLM: the API surface for them--opaque text streaming in both directions--is sufficiently universal that I don't feel bad about the semantic lock-in and I just don't see any reasonable way to do this via the end-to-end principal that preserves user control over tradeoffs in privacy, functionality, and cost... if I go to a website that uses an LLM I should be the one choosing which LLM it is using, NOT the website!!, and if I want it to use some local model or the world's most powerful cloud model, I 1) should be in control of that selection and 2) pretty much have to be for local models to be feasible at all as I can't sit around downloading and caching gigabytes of data, separately, from every service that might make use of an LLM. (edit: Ok, in thinking about it a lot more maybe it makes more sense for this to be a separate daemon run next to the web browser--even if it comes with the web browser--which merely provides a localhost HTTP interface to the LLM, so it can also be shared by native apps... though, I am then unsure how web applications would be able to access them securely due to all of the security restrictions on cross-origin insecure port access.)

SeanAnderson11 months ago

This doesn't seem useful unless it's something standardized across browsers. Otherwise I'd still need to use a plugin to support safari, etc.It seems like it could be nice for something like a bookmarklet or a one-off script, but I don't think it'll really reduce friction in engaging with Gemini for serious web apps.

评论 #40834763 未加载

评论 #40834819 未加载

评论 #40834952 未加载

评论 #40834754 未加载

评论 #40834719 未加载

richardw11 months ago

Going a touch further: make it a pluggable local model. Browser fetches the first 10 links from google in the background, watches the YouTube video, hides the Google ads, presents you with the results.Now not only can Google front the web pages who feed them content they make summaries from, but the browser can front Google.“Your honour, this is just what Google has been saying is a good thing. We just moved it to the edge. The users win, no?”

评论 #40835905 未加载

DrAwesome11 months ago

<a href="https://developer.chrome.com/docs/ai/built-in" rel="nofollow">https://developer.chrome.com/docs/ai/built-in</a>

评论 #40835925 未加载

haolez11 months ago

On a side note, I think window.assistant might be a better name. AI is a tired and ambiguous term at this point.

评论 #40835667 未加载

echelon11 months ago

I hope we can use LLMs in the browser to censor all ads, clickbait, and annoyances forever.An in-browser LLM will be the ultimate attention preserver if we task it to identify content we don't like and to remove it.

评论 #40834739 未加载

riiii11 months ago

How the great have fallen. Google announces browser embedded AI and receives nothing but rightful hate and resentment.

评论 #40834844 未加载

LarsDu8811 months ago

If they're going to cram a small LLM in the browser, they might as well start cramming a small image generating diffusion model + new image format to go along with it.I believe we can start compressing down the amount of data going over the wire 100x this way...

评论 #40835244 未加载

ngkw11 months ago

Seemingly very interesting moves, but I wonder what specific needs remain for a 'browser-level' local LLM, because local LLMs will be on devices in the near future. So if we're not connected to the internet, maybe a device-level LLM would be better. On the other hand, when we open the browser, we're connected to the great internet! I know browser-level LLMs can have several benefits like speed, privacy protection, and cost-effectiveness, but these features are covered by internet-based LLM APIs or device-level LLM APIs.

sthuck11 months ago

I mostly think it's an interesting concept that can allow many interesting user experiences.At the same time, it is a major risk for browser compatibility. Despite many articles claiming otherwise, I think we mostly avoided repeating the "works only on IE6" situation with chrome. Google did kinda try at times, but most things didn't catch on. This I think has the potential to do some damage on that front.

评论 #40835942 未加载

评论 #40834917 未加载

Klonoar11 months ago

Alright, here’s a take I haven’t seen in this thread yet: how could this be used for fingerprinting, beyond an existence check for the API itself?

评论 #40835874 未加载

评论 #40836402 未加载

Jerry211 months ago

I hope it can be disabled.

评论 #40834701 未加载

评论 #40834715 未加载

VoidWhisperer11 months ago

Maybe I'm being cynical, but I feel like without ample amounts of sandboxing, people are going to find some way to abuse this

评论 #40834744 未加载

nox10111 months ago

I see nothing about adding 'window.ai' from google. Am I missing it? I see some stuff about sdks but no 'window.ai'

评论 #40835956 未加载

sensanaty11 months ago

And how do I turn this cancer off permanently?

评论 #40836248 未加载

评论 #40836252 未加载

hypeatei11 months ago

AI brain rot continues but now it's reaching unimaginable levels and infecting browser APIs, wow!

评论 #40834825 未加载

评论 #40834863 未加载

评论 #40834878 未加载

niutech11 months ago

You can run a local Gemini Nano LLM in any browser, just download the weights from HuggingFace and run through MediaPipe using WebGPU: <a href="https://x.com/niu_tech/status/1807073666888266157" rel="nofollow">https://x.com/niu_tech/status/1807073666888266157</a>

wonderfuly11 months ago

I built an app to play with it: <a href="https://chrome-ai-example.vercel.app/" rel="nofollow">https://chrome-ai-example.vercel.app/</a>

tasuki11 months ago

I wish AI came for my Android keyboard.I regularly type in English, Czech, and Polish, and Gboard doesn't even know some of the basic words or word forms.

评论 #40836935 未加载

ENGNR11 months ago

I would honestly love this, so that users don't even have to think about AI.- Massively broaden the input for forms because the AI can accept or validate inputs better- Prefill forms from other known data, at the application level- Understand files/docs/images before they even go up, if they go up at all- Provide free text instructions to interact with complex screens/domain modelsUsing the word AI everywhere is marketing, not dev

评论 #40835778 未加载

ajdude11 months ago

Does this mean all of those companies that complained about iterm2 recently on here are going to finally stop using chrome?

评论 #40835006 未加载

dhon_11 months ago

Is Multicast networking suitable for distributing something like this on a large scale?

shepherdjerred11 months ago

I'm very excited about this, but I wish it were a web standard.

lenkite11 months ago

Wow, no need to even send your data to the server for mining and analysis. Just use your local CPU/GPU and your power to do comprehensive ad analytics of all your data. No need to maintain expensive server farms!

MBCook11 months ago

And my love for Safari goes up a little more.

评论 #40834748 未加载

评论 #40834775 未加载

copilot88611 months ago

test

seydor11 months ago

I wish this becomes an open standard - We don't want another AI walled garden

smolder11 months ago

This is ridiculous. The "AI" frenzy has jumped the shark.

wrycoder11 months ago

Must be a fuss in Redmond “They can’t do that! Let’s sue.” Well, they beat Lindows, let’s see if they can beat Google.

some_furry11 months ago

Sigh, don't make me tap the sign*.I used to hold Google Chrome in high esteem due to its security posture. Shoehorning AI into it has deleted any respect I held for Chrome or the team that develops it.Trust arrives on foot and leaves on horseback.* The sign: <a href="https://ludic.mataroa.blog/blog/i-will-fucking-piledrive-you-if-you-mention-ai-again/" rel="nofollow">https://ludic.mataroa.blog/blog/i-will-fucking-piledrive-you...</a>

评论 #40834874 未加载

评论 #40834969 未加载