My first impression is that this should enable approximately what Apple is doing with their AI strategy (local on-device first, then filling back to a first party API, and finally something like ChatGPT), but for web users. Having it native in the browser <i>could</i> be really positive for a lot of use cases depending on whether the local version can do things like RAG using locally stored data, and generate structured information like JSON.<p>I don't think this is a terrible idea. LLM-powered apps are here to stay, so browsers making them better is a good thing. Using a local model so queries aren't flying around to random third parties is better for privacy and security. If Google can make this work well it could be really interesting.
> The code below is all you need to stream text with Chrome AI and the Vercel AI SDK. ... `chromeai` implements a Provider that uses `window.ai` under the hood<p>Leave it to Vercel to announce `window.ai` on Google's behalf by showing off their own abstraction but not the actual Chrome API.<p>Here's a blog post from a few days ago that shows how the actual `window.ai` API works [0]. The code is extremely simple and really shouldn't need a wrapper:<p><pre><code> const model = await window.ai.createTextSession();
const result = await model.prompt("What do you think is the meaning of life?");
</code></pre>
[0] <a href="https://afficone.com/blog/window-ai-new-chrome-feature-api/" rel="nofollow">https://afficone.com/blog/window-ai-new-chrome-feature-api/</a>
If this is the API that Google are going with here:<p><pre><code> const model = await window.ai.createTextSession();
const result = await model.prompt("3 names for a pet pelican");
</code></pre>
There's a VERY obvious flaw: is there really no way to specify the model to use?<p>Are we expecting that Gemini Nano will be the one true model, forever supported by this API baked into the world's most popular browser?<p>Given the rate at which models are improving that would be ludicrous. But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?<p>Something like this would at least give us a fighting chance:<p><pre><code> const supportedModels = await window.ai.getSupportedModels();
if (supportedModels.includes("gemini-nano:0.4")) {
const model = await window.ai.createTextSession("gemini-nano:0.4");
// ...</code></pre>
So they don't standardize things anymore?<p>Look at WebNN [1]. It's from Microsoft and is basically DirecttML but they at least pretend to make it a Web thing.<p>The posture matters. Apple tried to expose Metal through WebGPU [2] then silent-abandoned it. But they had the posture, and other vendors picked it up and made it real.<p>That won't happen to window.ai until they stop sleepwalking.<p>[1] <a href="https://www.w3.org/TR/webnn/" rel="nofollow">https://www.w3.org/TR/webnn/</a><p>[2] <a href="https://www.w3.org/TR/webgpu/" rel="nofollow">https://www.w3.org/TR/webgpu/</a>
If we thought websites mining Monera in ads was bad, wait until every site sells its users’ CPU cycles on a gray market for distributed LLM processing!
See<p><a href="https://developer.chrome.com/docs/ai/built-in" rel="nofollow">https://developer.chrome.com/docs/ai/built-in</a><p><a href="https://github.com/jeasonstudio/chrome-ai">https://github.com/jeasonstudio/chrome-ai</a><p>I can’t seem to find public documentation for the API with a cursory search, so <a href="https://github.com/jeasonstudio/chrome-ai/blob/ec9e334253713e08314bb2dcde3aa124f4f40a68/src/language-model.ts#L75">https://github.com/jeasonstudio/chrome-ai/blob/ec9e334253713...</a> might be the best documentation (other than directly inspecting the window.ai object in console) at the moment.<p>It’s not really clear if the Gemini Nano here is Nano-1 (1.8B) or Nano-2 (3.25B) or selected based on device.
Today I finally clicked on that "Create theme with AI" on chrome's default page. I'm really having a hard time trying to differentiate it with selecting any random theme.<p>At this point I'm going to create an image generator that's just an api to return random images from pixabay. pix.ai (opensource of course)
So it's loading an instruct model for inference? That seems a fair bit less useful than a base model, at least for more advanced use cases.<p>What about running LoRAs, adjusting temperature, configuring prompt templates, etc? It seems pretty early to build something like this into the browser. The technology is still changing so rapidly, it might look completely different in 5 years.<p>I'm a huge fan of local AI, and of empowering web browsers as a platform, but I'm feeling pretty stumped by this one. Is this a good inclusion at this time? Or is the Chrome team following the Google-wide directive to integrate AI _everywhere_, and we're getting a weird JS API as a result?<p>At the very least, I hope to see the model decoupled from the interface. In the same way that font-family loads locally installed fonts, it should be pluggable for other local models.
Can someone specialized in applied machine learning explain how this is useful? In my opinion, general-purpose models are only useful if they're large, as they are more capable and produce more accurate outputs for certain tasks. For on-device models, fine-tuned ones for specific tasks have greater precision with the same size.
YES!!! Back when Opera was adding a local AI to their browser UI, I had explained how I wanted it to be exposed as an API, as it seems like one of the few ACTUAL good uses for a user agent API: letting me choose which model I am using and where my data is going, rather than the website I am using (which inherently will require standardizing an API surface in the browser websites can use instead of trying to compete for scant memory resources by bringing their own local model or shipping my data off to some remote API).<p><a href="https://news.ycombinator.com/item?id=39920803">https://news.ycombinator.com/item?id=39920803</a><p>> So while I am usually the person who would much rather the browser do almost nothing that isn't a hardware interface, requiring all software (including rendering) to be distributed as code by the website via the end-to-end principal--making the browser easy to implement and easy to secure / sandbox, as it is simply too important of an attack surface to have a billion file format parsing algorithms embedded within it--I actually would love (and I realize this isn't what Opera is doing, at least yet) to have the browser provide a way to get access to a user-selected LLM: the API surface for them--opaque text streaming in both directions--is sufficiently universal that I don't feel bad about the semantic lock-in and I just don't see any reasonable way to do this via the end-to-end principal that preserves user control over tradeoffs in privacy, functionality, and cost... if I go to a website that uses an LLM <i>I</i> should be the one choosing <i>which</i> LLM it is using, NOT the website!!, and if I want it to use some local model or the world's most powerful cloud model, I 1) should be in control of that selection and 2) pretty much have to be for local models to be feasible at all as I can't sit around downloading and caching gigabytes of data, separately, from every service that might make use of an LLM. (edit: Ok, in thinking about it a lot more maybe it makes more sense for this to be a separate daemon run next to the web browser--even if it comes with the web browser--which merely provides a localhost HTTP interface to the LLM, so it can also be shared by native apps... though, I am then unsure how web applications would be able to access them securely due to all of the security restrictions on cross-origin insecure port access.)
This doesn't seem useful unless it's something standardized across browsers. Otherwise I'd still need to use a plugin to support safari, etc.<p>It seems like it could be nice for something like a bookmarklet or a one-off script, but I don't think it'll really reduce friction in engaging with Gemini for serious web apps.
Going a touch further: make it a pluggable local model. Browser fetches the first 10 links from google in the background, watches the YouTube video, hides the Google ads, presents you with the results.<p>Now not only can Google front the web pages who feed them content they make summaries from, but the browser can front Google.<p>“Your honour, this is just what Google has been saying is a good thing. We just moved it to the edge. The users win, no?”
I hope we can use LLMs in the browser to censor all ads, clickbait, and annoyances forever.<p>An in-browser LLM will be the ultimate attention preserver if we task it to identify content we don't like and to remove it.
If they're going to cram a small LLM in the browser, they might as well start cramming a small image generating diffusion model + new image format to go along with it.<p>I believe we can start compressing down the amount of data going over the wire 100x this way...
Seemingly very interesting moves, but I wonder what specific needs remain for a 'browser-level' local LLM, because local LLMs will be on devices in the near future. So if we're not connected to the internet, maybe a device-level LLM would be better. On the other hand, when we open the browser, we're connected to the great internet! I know browser-level LLMs can have several benefits like speed, privacy protection, and cost-effectiveness, but these features are covered by internet-based LLM APIs or device-level LLM APIs.
I mostly think it's an interesting concept that can allow many interesting user experiences.<p>At the same time, it is a major risk for browser compatibility. Despite many articles claiming otherwise, I think we mostly avoided repeating the "works only on IE6" situation with chrome. Google did kinda try at times, but most things didn't catch on. This I think has the potential to do some damage on that front.
You can run a local Gemini Nano LLM in any browser, just download the weights from HuggingFace and run through MediaPipe using WebGPU: <a href="https://x.com/niu_tech/status/1807073666888266157" rel="nofollow">https://x.com/niu_tech/status/1807073666888266157</a>
I built an app to play with it: <a href="https://chrome-ai-example.vercel.app/" rel="nofollow">https://chrome-ai-example.vercel.app/</a>
I wish AI came for my Android keyboard.<p>I regularly type in English, Czech, and Polish, and Gboard doesn't even know some of the basic words or word forms.
I would honestly love this, so that users don't even have to think about AI.<p>- Massively broaden the input for forms because the AI can accept or validate inputs better<p>- Prefill forms from other known data, at the application level<p>- Understand files/docs/images before they even go up, if they go up at all<p>- Provide free text instructions to interact with complex screens/domain models<p>Using the word AI everywhere is marketing, not dev
Wow, no need to even send your data to the server for mining and analysis. Just use your local CPU/GPU and your power to do comprehensive ad analytics of all your data. No need to maintain expensive server farms!
Sigh, don't make me tap the sign*.<p>I used to hold Google Chrome in high esteem due to its security posture. Shoehorning AI into it has deleted any respect I held for Chrome or the team that develops it.<p>Trust arrives on foot and leaves on horseback.<p>* The sign: <a href="https://ludic.mataroa.blog/blog/i-will-fucking-piledrive-you-if-you-mention-ai-again/" rel="nofollow">https://ludic.mataroa.blog/blog/i-will-fucking-piledrive-you...</a>