Hi HN! I was mesmerized by the Claude Computer Use reveal last week and was specifically impressed by how well it navigated websites. This motivated me to create Cerebellum, a library that lets an LLM take control of a browser.<p>Here is a demo of Cerebellum in action, performing the goal “Find a USB C to C cable that is 10 feet long and add it to cart” on amazon.com:<p><a href="https://youtu.be/xaZbuaWtVkA?si=Tq9lE6BXv9wjZ-qC" rel="nofollow">https://youtu.be/xaZbuaWtVkA?si=Tq9lE6BXv9wjZ-qC</a><p>Currently, it uses Claude 3.5 Sonnet’s newly released computer use ability, but the ultimate goal is to crowdsource a high quality set of browser sessions to train an open source local model.<p>Checkout the MIT licensed repo on github (<a href="https://github.com/theredsix/cerebellum">https://github.com/theredsix/cerebellum</a>) or install the library from npm (<a href="https://www.npmjs.com/package/cerebellum-ai" rel="nofollow">https://www.npmjs.com/package/cerebellum-ai</a>)<p>Looking for feedback from the HN community, especially on: What browser tasks would you use an LLM to complete? Thanks again for taking a look!
> but the ultimate goal is to crowdsource a high quality set of browser sessions to train an open source local model.<p>Could you say more on this? I see that it's an open-source implementation of PLAN with Selenium and Claude's Cursor, but where will the "successes" of browser sessions be stored? Also, will it include an anonymization feature to remove PII from authenticated use cases?
You don't need LLM.<p>Build interface to build knowledge graph.<p>Nodes containing words, verbs are action, nouns are past verb. Action is movement on space.