I'm really excited about this project and I think it could be really disruptive. It is organized by LAION, the same folks who curated the dataset used to train Stable Diffusion.<p>My understanding of the plan is to fine-tune an existing large language model, trained with self-supervised learning on a very large corpus of data, using reinforcement learning from human feedback, which is the same method used in ChatGPT. Once the dataset they are creating is available, though, perhaps better methods can be rapidly developed as it will democratize the ability to do basic research in this space. I'm curious regarding how much more limited the systems they are planning to build will be compared to ChatGPT, since they are planning to make models with far less parameters to deploy them on much more modest hardware than ChatGPT.<p>As an AI researcher in academia, it is frustrating to be blocked from doing a
lot of research in this space due to computational constraints and a lack of the required data. I'm teaching a class this semester on self-supervised and generative AI methods, and it will be fun to let students play around with this in the future.<p>Here is a video about the Open Assistant effort: <a href="https://www.youtube.com/watch?v=64Izfm24FKA">https://www.youtube.com/watch?v=64Izfm24FKA</a>
The power in ChatGPT isn't that it's a chat bot, but its ability to do semantic analysis. It's already well established that you need high quality semi-curated data + high parameter count and that at a certain critical point, these models start comprehending and understanding language. All the smart people in the room at Google, Facebook, etc are absolutely pouring resources into this I promise they know what they're doing.<p>We don't need yet-another-GUI. We need someone with a warehouse of GPUs to train a model with the parameter count of GPT3. Once that's done you'll have thousands of people cranking out tools with the capabilities of ChatGPT.
Is anyone working on an Ender's Game style "Jane" assistant that just listens via an earbud and responds? That seems totally within the realm of current tech but I haven't seen anything.
This is wonderful, no doubt about it, but the bigger problem is for making this usable on commodity hardware. Stablediffusion only needs 4 GB of RAM to run inference, but all of these large language models are too large to run on commodity hardware. Bloom from huggingface is already out and no one is able to use it. If chatgpt was given to the open source community, we couldn’t even run it…
Great looking project here. Absolutely need a local/FOSS option. There's been a number of open-source libraries for LLMs lately that simply call into paid/closed models via APIs. Not exactly the spirit of open-source.<p>There's already great local/FOSS options such as FLAN-T5 (<a href="https://huggingface.co/google/flan-t5-base" rel="nofollow">https://huggingface.co/google/flan-t5-base</a>). Would be great to see a local model like that trained specifically for chat.
In the not too distant future we may see integrations with always-on recording devices (yes, I know, shudder) transcribing our every conversation and interaction and incorporating the text in place of the current custom-corpus style addenda to LLMs to give a truly personal and social skew to the current capabilities in the form of automatically-compiled memories to draw on.
The other thread has more comments: <a href="https://news.ycombinator.com/item?id=34654937" rel="nofollow">https://news.ycombinator.com/item?id=34654937</a>
Given how nerfed ChatGPT is (which is likely nothing compared to what large risk-adverse companies like Microsoft/Google will do), I'm heavily anticipating a Stable Diffusion-style model that is more free or at least configurable to have stronger opinions.
What if we use chatGPT responses as contributions? I dont see a legal issue here, unless openAi can claim ownership of any of their input/output material. It would be also a good way for those disillusioned by the "openness" of that company
Playing the "training game" is very interesting and kind of addictive.<p>The "reply as robot" task in particular is really enlightening. If you try to give it any sense of personality or humanity, your comments will be downvoted and flagged by other players.<p>It's like everybody, without instruction, has this pre-assumption that these assistants should have a deeply subservient, inhumane and corporate affectation.
Great, if i can use this to interactively search inside (OCR-) documents, files, emails and so on, would be huge, like asking when does my passport expire, or when were my grades in high school and so on.
My understanding is that OpenAI more or less created a supercomputer to train their model. How do we replicate that here?<p>Is it possible to use a “SETI at Home” style approach to parcel out training?
I think we are right around the corner from actual AI personal assistants, which is pretty exciting.
We have great tooling for speech to text, text to speech, and LLMs with memory for “talking” to the AI. Combining those with both an index of the internet (for up to date data, likely a big part of the Microsoft/open ai partnership) and an index of your own content/life data, and this could all actually work together soon.
I’m an iPhone guy, but I would imagine all of this could be combined together on an android phone (due to it being way more flexible) then combining that with a wireless earbud and then rather than it being a “normal” phone, it’s just a pocketable smart assistant.
Crazy times we live in. I’m 35, so have basically lived through the world being “broken” by tech a few times now: the internet, social media, and smart phones all fundamentally reshaped society. Seems like AI that we are living through right now is about to break the world again.<p>EDIT: everything I wrote above is going to immediately run into a legal hellscape, I get that. If everyone has devices in their pockets recording and processing everything spoken around them in order to assist their owner, real life starts getting extra dicey quickly. Will be interesting to see how it plays out.
<a href="https://github.com/LAION-AI/Open-Assistant/issues/1110">https://github.com/LAION-AI/Open-Assistant/issues/1110</a><p>> <a href="https://www.gutenberg.org/" rel="nofollow">https://www.gutenberg.org/</a> has an extensive collection of ebooks in multiple languages and formats that would make great trianing data<p>…<p>> There is detailed legal information on which books are under public domain and which ones are copyrighted, it would be great if someone would go through these and decide which books are okay to crawl and use as training data (my understanding is
that it is okay to scrape the contents as they are publicly available in a browser, but just to be sure)<p>Yup, sure are the same folk who put together that dataset they used to train stable diffusion.<p>Data? Yeah, just take everything. It’s all good.
I've been excited about the notion of this for a while, but it's unclear to me how this would succeed where numerous well-resourced companies have failed.<p>Are there some advantages that Open Assistant has that Google/Amazon/Apple lack that would allow them to succeed?
Though it's interesting to see the capabilities of "conversational user interfaces" improve, the current implementations are too verbose and slow for many real world tasks, and more importantly, context still has to be provided manually. I believe the next big leap will be low-latency dedicated assistants which are focused on specific tasks, with normalized and predictable results from prompts.<p>It may be interesting to see how a creative task like image or text generation changes when rewording your request slightly - after a minute wait - but if I'm giving directions to my autonomous vehicle, ambiguity and delay is completely unacceptable.
Hi All - this is Huu (gh: @ontocord) - one of the founders of the OA project (along with Andreas, Christoph and of course Yannick). I just discovered this discussion while googling... please join our discord: <a href="https://discord.com/invite/H769HxZyb5" rel="nofollow">https://discord.com/invite/H769HxZyb5</a><p>Shout-out to lucidrains! I'm a big fan!
This seems similar to a project I've been working on: <a href="https://browserdaemon.com" rel="nofollow">https://browserdaemon.com</a>. In regards to your crowd sourced data collection, perhaps you should have some hidden percentage of prompts where you know the correct completion to them already, to catch bad actors.
This sounds like cheating to me. Human training will get good results, like chatgpt, and this has value, but we all want the ai to do all the work, don't we? I ask as an almost complete ignorant regarding the subject, and might aswell be wrong.
OpenReplacement is probably a more fitting name for the future. Don't want to be stuck with an outmoded name when the project evolves into something else.<p>Sure it can start out as an assistant, in 10 years it will replace you at your job.
This has a similar impact potential of Wikipedia. People from all around the world providing feedback/curating input data. Also, now I can just deploy it within my org and customize it. Awesome!
re: running on your own hardware.. How?<p>I know very little about ML, but i had assumed the reason models ran on GPUs typically(?) was because of the heavy compute needed over large sets of in memory data.<p>Moving it to something cheaper ala general CPU and RAM/Drive would make it prohibitively slow in the standard methodology.<p>How would we be able to change this to run on users standard hardware? Presuming standard hardware is cheaper, why isn't ChatGPT also running on this cheaper hardware?<p>Are there significant downsides to using lesser hardware? Or is this some novel approach?<p>Super curious!
Amazing project but does it can even compete against GPT right now? Open source leads innovation towards closed source (Linux to Windows) but in this case it's the contrary
Cool project.<p>One thing I noticed about the website, however, is it is written using Next and doesn't work w/ JavaScript turned off in the browser. I thought that Next was geared for server-side rendered React where you could turn off JS in the browser.<p>Seems like this would improve the SEO factor, and in doing so, might help spread the word more.<p><a href="https://github.com/LAION-AI/laion.ai">https://github.com/LAION-AI/laion.ai</a>
It sounds like you can train this assistant on your own corpus of data. Am I right? What are the hardware and time requirements for that? The readme sounds a bit futuristic, has anyone actually used this, or is this just the vision of what's to come?
Can these ChatGPT like systems trace their answers back to the source material?<p>To me this seems like the missing link to make Google search and the like dead
TLDR: OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.<p>________<p>Related video by one of the contributors on how to help:<p>- <a href="https://youtube.com/watch?v=64Izfm24FKA">https://youtube.com/watch?v=64Izfm24FKA</a><p>Source Code:<p>- <a href="https://github.com/LAION-AI/Open-Assistant">https://github.com/LAION-AI/Open-Assistant</a><p>Roadmap:<p>- <a href="https://docs.google.com/presentation/d/1n7IrAOVOqwdYgiYrXc8Sj0He8krn5MVZO_iLkCjTtu0/edit?usp=sharing" rel="nofollow">https://docs.google.com/presentation/d/1n7IrAOVOqwdYgiYrXc8S...</a><p>How you can help / contribute:<p>- <a href="https://github.com/LAION-AI/Open-Assistant#how-can-you-help">https://github.com/LAION-AI/Open-Assistant#how-can-you-help</a>
that same laion that scraped the web for images, ignored their licenses and copyrights, and thought that'd do just fine? the one that chose to not implement systems that would detect licenses, and to not have license fields in their datasets? the one that knowingly points to copyrighted works in their datasets, yet also pretends like they're not doing anything at all? that same group?<p>really trustworthy.
It's funny because the moment this is available to run on your machine you realize how useless it is. It might be fun to test its conversational limits, but only Siri can actually set an alarm or a timer or run a shortcut, while this thing can only blabber
I was very excited about Stable Diffusion, and I still am. A great yet relatively harmless contribution.<p>LLMs however, not so much. The avenues of misuse are just too great.<p>I started this whole thing somewhat railing against the un-openness of OpenAI. But once I began using ChatGPT, I realized that having centralized control of a tool like this in the hands of reasonable people is not the worst possible outcome for civilization.<p>While I support FOSS in most realms, in some I do not. Reality has taught me to stop being rigidly religious about these things. Just because something is freely available does not magically make it "good."<p>In the interest of curiosity and discussion, can someone give me some actual real-world examples of what a FOSS ChatGPT will enable that OpenAI's tool will not? And, please be specific, not just "no censorship." Please give examples of that censorship.