Show HN: A web browser agent in your Chrome side panel

150 pointsby parsabg4 days ago

Hey HN,I'm excited to share BrowserBee, a privacy-first AI assistant in your browser that allows you to run and automate tasks using your LLM of choice (currently supports Anthropic, OpenAI, Gemini, and Ollama). Short demo here: <a href="https://github.com/user-attachments/assets/209c7042-6d54-4fce-92a7-ddf8519156c6">https://github.com/user-attachments/assets/209c7042-6d54-4fc...</a>Inspired by projects like Browser Use and Playwright MCP, its main advantage is the browser extension form factor which makes it more convenient for day to day use, especially for less technical users. Its also a bit less cumbersome to use on websites that require you to be logged in, as it attaches to the same browser instance you use (on privacy: the only data that leaves your browser is the communication with the LLM - there is no tracking or data collection of any sort).Some of its core features are as follows:- a memory feature which allows users to memorize common and useful pathways, making the next repetition of those tasks faster and cheaper- real-time token counting and cost tracking (inspired by Cline)- an approval flow for critical tasks such as posting content or making payments (also inspired by Cline)- tab management allowing the agent to execute tasks across multiple tabs- a range of browser tools for navigation, tab management, interactions, etc, which are broadly in line with Playwright MCPI'm actively developing BrowserBee and would love to hear any thoughts, comments, or feedback.Feel free to reach out via email: parsa.ghaffari [at] gmail [dot] com-Parsa

22 comments

dataviz10004 days ago

You might be able to reduce the amount of information sent to the LLM by 100 fold if you use a stacking context. Here is an example of one made available on Github (not mine). [0] Moreover, you will be able to parse the DOM or have strategies that parse the DOM. For example, if you are only concerned with video, find all the videos and only send that information. Perhaps parsing a page once finding the structure and caching that so the next time only the required data is used. (I see you are storing tool sequence but I didn't find an example of storing a DOM structure so that requests to subsequent pages are optimized.)If someone visits my website that I control using your Chrome Extension, I will 100% be able to find a way to drain all their accounts probably in the background without them even knowing. Here are some ideas about how to mitigate that.The problem with Playwright is that it requires Chrome DevTools Protocol (CDP) which opens massive security problems for a browser that people use for their banking and managing anything that involves credit cards are sensitive accounts. At one point, I took the injected folder out of Playwright and injected it into a Chrome Extension because I thought I needed its tools, however, I quickly abandoned it as it was easy to create workflows from scratch. You get a lot of stuff immediately by using Playwright but likely you will find it will be much lighter and safer to just implement that functionality by yourself.The only benefit of CDP for normal use is allowing automation of any action in the Chrome Extension that requires trusted events, e.g. play sound, go fullscreen, banking websites what require trusted event to transfer money. I'm my opinion, people just want a large part of the workflow automated and don't mind being prompted to click a button when trusted events are required. Since it doesn't matter what button is clicked you can inject a big button that says continue or what is required after prompting the user. Trusted events are there for a reason.[0] <a href="https://github.com/andreadev-it/stacking-contexts-inspector">https://github.com/andreadev-it/stacking-contexts-inspector</a>

评论 #44021825 未加载

评论 #44024048 未加载

nico4 days ago

Looks amazing, love it. And I see that in your roadmap the top thing is saving/replaying sessionsRelated to that, I'd suggest also adding the ability to "templify" sessions, ie. turn sessions into sort of like email templates, with placeholder tags or something of the like, that either ask the user for input, or can be fed input from somewhere else (like an "email merge")So for example, if I need to get certain data from 10 different websites, either have the macro/session ask me 10 times for a new website (or until I stop it), or allow me to just feed it a listAnyway, great work! Oh also, if you want to be truly privacy-first you could add support for local LLMs via ollama

评论 #44021734 未加载

stoicfungi4 days ago

Looks awesome. Last couple of months, I've built a similar Chrome Extension, <a href="https://overlay.one/en" rel="nofollow">https://overlay.one/en</a>I also started with with conversational mode and interactive mode, but later removed the interactive mode to keep its features a bit simple.

评论 #44022313 未加载

barbazoo4 days ago

> Since BrowserBee runs entirely within your browser (with the exception of the LLM), it can safely interact with logged-in websites, like your social media accounts or email, without compromising security or requiring backend infrastructure.Does it send the content of the website to the LLM?

评论 #44021692 未加载

dbdoskey4 days ago

Looks amazing. Would love something like this in Firefox or Zen. Mozilla released Orbit, but it was never something that ended up really being useful.

评论 #44021534 未加载

评论 #44021770 未加载

tnjm3 days ago

Thanks for building this!It struggled with tasks I asked for (e.g. download the March and April invoices for my GitHub org "myorg") -- it got errors parsing the DOM and eventually gave up. I recommend taking a look at the browser-use approach and specifically their buildDOMTree.js script. Their strategy for turning the DOM into an LLM parsable list of interactive elements, and visually tagging them for vision models, is unreasonably effective. I don't know if they were the first to come up with it, but it's genius and extracting it for my browser-using agents has hugely increased their effectiveness.

gurvinderd2 days ago

Good work. I did the same thing - <a href="https://chromewebstore.google.com/detail/auto-browse/ngnikmgmglbjbhdphbikalohholhalfe?hl=en-US&utm_source=ext_sidebar" rel="nofollow">https://chromewebstore.google.com/detail/auto-browse/ngnikmg...</a> and <a href="https://github.com/auto-browse/auto-browse-agent">https://github.com/auto-browse/auto-browse-agent</a> I tried playwright-crx, but it increased the size of the extension and sometime the browser got stuck. So i moved to using puppeteer instead. To save token, i have not enabled screenshot, instead relying on DOM.

krembo4 days ago

Chrome canary already had Gemini Nano built in into the browser for local LLM. For the use cases you mentioned there is no need to call a 3rd party.

评论 #44022303 未加载

评论 #44022227 未加载

评论 #44024857 未加载

m0rde4 days ago

This looks fun, thanks for sharing. Will definitely give it a shot soon.I read over the repo docs and was amazed at how clean and thorough it all looks. Can you share your development story for this project? How long did it take you to get here? How much did you lean on AI agents to write this?Also, any plans for monetization? Are you taking donations? :)

评论 #44021817 未加载

hoppp4 days ago

What makes it privacy first?Shouldn't it use local llm then?Does it send my password to a provider when it signs up to a website for me?

评论 #44034242 未加载

matula4 days ago

Very nice. I tried with Ollama and it works well.The biggest issue is having the Ollama models hardcoded to Qwen3 and Llama 3.1. I imagine most Ollama users have their favorites, and probably vary quite a bit. My main model is usually Gemma 3 12B, which does support images.It would be a nice feature to have a custom config on the Ollama settings page, save those to Chrome storage, and use that in the 'getAvailableModels' method, along with the hardcoded models.

评论 #44022778 未加载

jaggs4 days ago

This looks really well done. I particularly like the simple user interface. A lot of the time these things are unnecessarily complex I feel.

donclark4 days ago

Looks great. Any plans for this to work in Firefox?

评论 #44021780 未加载

0xd1r4 days ago

Can it perform DOM manipilation as well, like fill forms or would the LLM response need to be structured for each specific site to use it on? And would an LLM be able to perform such a task?

评论 #44022763 未加载

rizs124 days ago

Aren't browsers starting to ship with built-in LLMs? I don't know much about this but if so then surely your extension won't need to send queries to LLM APIs?

评论 #44021829 未加载

saadshamim4 days ago

I keep getting the "Error: Failed to stream response from [Gemini | OpenAi] API. Please try again." - tried valid new keys from both google/openai

评论 #44022325 未加载

A4ET8a8uTh0_v24 days ago

Interesting. I can't play with it now since out for grocery run, but can it interact with elements on the page if asked directly?

评论 #44021749 未加载

dmos624 days ago

I presume that this works by processing the html and feeding to the llm. What approaches did you take for doing this? Or am I wrong?

评论 #44021521 未加载

hiccuphippo4 days ago

Can this be used to automatically remove the plethora of cookie banners/modals polluting the web?

评论 #44022308 未加载

评论 #44024544 未加载

reliablereason4 days ago

Looks like the example video is extremely expensive. It racks up almost 2$ of usage in about a minute.

评论 #44022286 未加载

afshinmeh4 days ago

Looks great! Tried a few examples and models, works very well.

flakiness4 days ago

Looks good,I've been disappointed by the fact that Chrome doesn't have this. I don't want to give full access to my browsing to a random extension (not an offense to this specific one, but general security hygiene - there are so many scammy extensions out there). Chrome (or browser of your choice) already has that trust, good or bad. Please use the trust in a good way. It's a table stake at this point.