Open Assistant: Conversational AI for Everyone

424 pointsby chriskananover 2 years ago

43 comments

chriskananover 2 years ago

I'm really excited about this project and I think it could be really disruptive. It is organized by LAION, the same folks who curated the dataset used to train Stable Diffusion.My understanding of the plan is to fine-tune an existing large language model, trained with self-supervised learning on a very large corpus of data, using reinforcement learning from human feedback, which is the same method used in ChatGPT. Once the dataset they are creating is available, though, perhaps better methods can be rapidly developed as it will democratize the ability to do basic research in this space. I'm curious regarding how much more limited the systems they are planning to build will be compared to ChatGPT, since they are planning to make models with far less parameters to deploy them on much more modest hardware than ChatGPT.As an AI researcher in academia, it is frustrating to be blocked from doing a lot of research in this space due to computational constraints and a lack of the required data. I'm teaching a class this semester on self-supervised and generative AI methods, and it will be fun to let students play around with this in the future.Here is a video about the Open Assistant effort: <a href="https://www.youtube.com/watch?v=64Izfm24FKA">https://www.youtube.com/watch?v=64Izfm24FKA</a>

评论 #34656659 未加载

评论 #34656439 未加载

评论 #34655464 未加载

评论 #34667510 未加载

评论 #34662210 未加载

评论 #34662816 未加载

评论 #34674577 未加载

评论 #34656185 未加载

评论 #34657443 未加载

amrbover 2 years ago

Having open source models could be as important as the Linux project imo

评论 #34655460 未加载

评论 #34655914 未加载

评论 #34655760 未加载

评论 #34655291 未加载

评论 #34657900 未加载

评论 #34655472 未加载

评论 #34656016 未加载

评论 #34655765 未加载

评论 #34655849 未加载

oceanplexianover 2 years ago

The power in ChatGPT isn't that it's a chat bot, but its ability to do semantic analysis. It's already well established that you need high quality semi-curated data + high parameter count and that at a certain critical point, these models start comprehending and understanding language. All the smart people in the room at Google, Facebook, etc are absolutely pouring resources into this I promise they know what they're doing.We don't need yet-another-GUI. We need someone with a warehouse of GPUs to train a model with the parameter count of GPT3. Once that's done you'll have thousands of people cranking out tools with the capabilities of ChatGPT.

评论 #34656057 未加载

评论 #34657064 未加载

评论 #34655762 未加载

评论 #34657450 未加载

评论 #34656141 未加载

评论 #34656621 未加载

评论 #34655846 未加载

评论 #34655944 未加载

damascusover 2 years ago

Is anyone working on an Ender's Game style "Jane" assistant that just listens via an earbud and responds? That seems totally within the realm of current tech but I haven't seen anything.

评论 #34656025 未加载

评论 #34656049 未加载

评论 #34655788 未加载

评论 #34655814 未加载

评论 #34657236 未加载

88stacksover 2 years ago

This is wonderful, no doubt about it, but the bigger problem is for making this usable on commodity hardware. Stablediffusion only needs 4 GB of RAM to run inference, but all of these large language models are too large to run on commodity hardware. Bloom from huggingface is already out and no one is able to use it. If chatgpt was given to the open source community, we couldn’t even run it…

评论 #34658208 未加载

评论 #34658123 未加载

评论 #34658670 未加载

txtaiover 2 years ago

Great looking project here. Absolutely need a local/FOSS option. There's been a number of open-source libraries for LLMs lately that simply call into paid/closed models via APIs. Not exactly the spirit of open-source.There's already great local/FOSS options such as FLAN-T5 (<a href="https://huggingface.co/google/flan-t5-base" rel="nofollow">https://huggingface.co/google/flan-t5-base</a>). Would be great to see a local model like that trained specifically for chat.

评论 #34656515 未加载

mellosoulsover 2 years ago

In the not too distant future we may see integrations with always-on recording devices (yes, I know, shudder) transcribing our every conversation and interaction and incorporating the text in place of the current custom-corpus style addenda to LLMs to give a truly personal and social skew to the current capabilities in the form of automatically-compiled memories to draw on.

评论 #34656156 未加载

评论 #34657238 未加载

评论 #34655498 未加载

rahimnathwaniover 2 years ago

The other thread has more comments: <a href="https://news.ycombinator.com/item?id=34654937" rel="nofollow">https://news.ycombinator.com/item?id=34654937</a>

siliconc0wover 2 years ago

Given how nerfed ChatGPT is (which is likely nothing compared to what large risk-adverse companies like Microsoft/Google will do), I'm heavily anticipating a Stable Diffusion-style model that is more free or at least configurable to have stronger opinions.

seydorover 2 years ago

What if we use chatGPT responses as contributions? I dont see a legal issue here, unless openAi can claim ownership of any of their input/output material. It would be also a good way for those disillusioned by the "openness" of that company

评论 #34655615 未加载

评论 #34655662 未加载

评论 #34655476 未加载

评论 #34655645 未加载

评论 #34656258 未加载

评论 #34658543 未加载

Mizzaover 2 years ago

Playing the "training game" is very interesting and kind of addictive.The "reply as robot" task in particular is really enlightening. If you try to give it any sense of personality or humanity, your comments will be downvoted and flagged by other players.It's like everybody, without instruction, has this pre-assumption that these assistants should have a deeply subservient, inhumane and corporate affectation.

评论 #34660835 未加载

BizarreByteover 2 years ago

I hope this project goes places. If tools like ChatGPT are the future it is imperative that open source solutions exist alongside them.

jacooperover 2 years ago

Great, if i can use this to interactively search inside (OCR-) documents, files, emails and so on, would be huge, like asking when does my passport expire, or when were my grades in high school and so on.

评论 #34656666 未加载

评论 #34656367 未加载

outside1234over 2 years ago

My understanding is that OpenAI more or less created a supercomputer to train their model. How do we replicate that here?Is it possible to use a “SETI at Home” style approach to parcel out training?

评论 #34659275 未加载

dchukover 2 years ago

I think we are right around the corner from actual AI personal assistants, which is pretty exciting. We have great tooling for speech to text, text to speech, and LLMs with memory for “talking” to the AI. Combining those with both an index of the internet (for up to date data, likely a big part of the Microsoft/open ai partnership) and an index of your own content/life data, and this could all actually work together soon. I’m an iPhone guy, but I would imagine all of this could be combined together on an android phone (due to it being way more flexible) then combining that with a wireless earbud and then rather than it being a “normal” phone, it’s just a pocketable smart assistant. Crazy times we live in. I’m 35, so have basically lived through the world being “broken” by tech a few times now: the internet, social media, and smart phones all fundamentally reshaped society. Seems like AI that we are living through right now is about to break the world again.EDIT: everything I wrote above is going to immediately run into a legal hellscape, I get that. If everyone has devices in their pockets recording and processing everything spoken around them in order to assist their owner, real life starts getting extra dicey quickly. Will be interesting to see how it plays out.

Quequauover 2 years ago

I tried this via the docker containers and wound up with what looked like their website. Not sure what I did wrong.

评论 #34657904 未加载

评论 #34657617 未加载

wokwokwokover 2 years ago

<a href="https://github.com/LAION-AI/Open-Assistant/issues/1110">https://github.com/LAION-AI/Open-Assistant/issues/1110</a>> <a href="https://www.gutenberg.org/" rel="nofollow">https://www.gutenberg.org/</a> has an extensive collection of ebooks in multiple languages and formats that would make great trianing data…> There is detailed legal information on which books are under public domain and which ones are copyrighted, it would be great if someone would go through these and decide which books are okay to crawl and use as training data (my understanding is that it is okay to scrape the contents as they are publicly available in a browser, but just to be sure)Yup, sure are the same folk who put together that dataset they used to train stable diffusion.Data? Yeah, just take everything. It’s all good.

karpierzover 2 years ago

I've been excited about the notion of this for a while, but it's unclear to me how this would succeed where numerous well-resourced companies have failed.Are there some advantages that Open Assistant has that Google/Amazon/Apple lack that would allow them to succeed?

评论 #34658679 未加载

评论 #34656676 未加载

评论 #34660259 未加载

braingeniousover 2 years ago

Does anybody know the hardware requirements for this?

评论 #34657737 未加载

评论 #34659796 未加载

评论 #34659672 未加载

russellbeattieover 2 years ago

Though it's interesting to see the capabilities of "conversational user interfaces" improve, the current implementations are too verbose and slow for many real world tasks, and more importantly, context still has to be provided manually. I believe the next big leap will be low-latency dedicated assistants which are focused on specific tasks, with normalized and predictable results from prompts.It may be interesting to see how a creative task like image or text generation changes when rewording your request slightly - after a minute wait - but if I'm giving directions to my autonomous vehicle, ambiguity and delay is completely unacceptable.

hiep256over 2 years ago

Hi All - this is Huu (gh: @ontocord) - one of the founders of the OA project (along with Andreas, Christoph and of course Yannick). I just discovered this discussion while googling... please join our discord: <a href="https://discord.com/invite/H769HxZyb5" rel="nofollow">https://discord.com/invite/H769HxZyb5</a>Shout-out to lucidrains! I'm a big fan!

darepublicover 2 years ago

This seems similar to a project I've been working on: <a href="https://browserdaemon.com" rel="nofollow">https://browserdaemon.com</a>. In regards to your crowd sourced data collection, perhaps you should have some hidden percentage of prompts where you know the correct completion to them already, to catch bad actors.

gverrillaover 2 years ago

This sounds like cheating to me. Human training will get good results, like chatgpt, and this has value, but we all want the ai to do all the work, don't we? I ask as an almost complete ignorant regarding the subject, and might aswell be wrong.

评论 #34660904 未加载

kilgnadover 2 years ago

OpenReplacement is probably a more fitting name for the future. Don't want to be stuck with an outmoded name when the project evolves into something else.Sure it can start out as an assistant, in 10 years it will replace you at your job.

mlbossover 2 years ago

This has a similar impact potential of Wikipedia. People from all around the world providing feedback/curating input data. Also, now I can just deploy it within my org and customize it. Awesome!

unshavedyakover 2 years ago

re: running on your own hardware.. How?I know very little about ML, but i had assumed the reason models ran on GPUs typically(?) was because of the heavy compute needed over large sets of in memory data.Moving it to something cheaper ala general CPU and RAM/Drive would make it prohibitively slow in the standard methodology.How would we be able to change this to run on users standard hardware? Presuming standard hardware is cheaper, why isn't ChatGPT also running on this cheaper hardware?Are there significant downsides to using lesser hardware? Or is this some novel approach?Super curious!

评论 #34656617 未加载

jcq3over 2 years ago

Amazing project but does it can even compete against GPT right now? Open source leads innovation towards closed source (Linux to Windows) but in this case it's the contrary

swyxover 2 years ago

@dang - duplicate of <a href="https://news.ycombinator.com/item?id=34654937" rel="nofollow">https://news.ycombinator.com/item?id=34654937</a>

评论 #34657213 未加载

zenosmosisover 2 years ago

Cool project.One thing I noticed about the website, however, is it is written using Next and doesn't work w/ JavaScript turned off in the browser. I thought that Next was geared for server-side rendered React where you could turn off JS in the browser.Seems like this would improve the SEO factor, and in doing so, might help spread the word more.<a href="https://github.com/LAION-AI/laion.ai">https://github.com/LAION-AI/laion.ai</a>

评论 #34657486 未加载

xivzgrevover 2 years ago

I’m amazed this was released within a few months of chatgpt. always funny how innovation clusters together.

评论 #34657633 未加载

ameliusover 2 years ago

We definitely need a way to rate these systems so we can have better expectations.An IQ test for language models?

评论 #34657228 未加载

评论 #34656321 未加载

评论 #34655628 未加载

xrdover 2 years ago

It sounds like you can train this assistant on your own corpus of data. Am I right? What are the hardware and time requirements for that? The readme sounds a bit futuristic, has anyone actually used this, or is this just the vision of what's to come?

评论 #34655333 未加载

评论 #34655278 未加载

funerrover 2 years ago

Is there a way to donate to this project?

d0100over 2 years ago

Can these ChatGPT like systems trace their answers back to the source material?To me this seems like the missing link to make Google search and the like dead

评论 #34661398 未加载

评论 #34664178 未加载

winddudeover 2 years ago

I'd be interested in helping, but the organisation is a bit of a cluster fuck.

评论 #34655506 未加载

yazzkuover 2 years ago

What's the tl;dr on the Apache license? Is there any guarantee that our data and labelling contributions will remain open?

O__________Oover 2 years ago

TLDR: OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.________Related video by one of the contributors on how to help:- <a href="https://youtube.com/watch?v=64Izfm24FKA">https://youtube.com/watch?v=64Izfm24FKA</a>Source Code:- <a href="https://github.com/LAION-AI/Open-Assistant">https://github.com/LAION-AI/Open-Assistant</a>Roadmap:- <a href="https://docs.google.com/presentation/d/1n7IrAOVOqwdYgiYrXc8Sj0He8krn5MVZO_iLkCjTtu0/edit?usp=sharing" rel="nofollow">https://docs.google.com/presentation/d/1n7IrAOVOqwdYgiYrXc8S...</a>How you can help / contribute:- <a href="https://github.com/LAION-AI/Open-Assistant#how-can-you-help">https://github.com/LAION-AI/Open-Assistant#how-can-you-help</a>

评论 #34656955 未加载

NayamAmarsheover 2 years ago

FOSS is the future!

residualmindover 2 years ago

and so it begins...

bilaterover 2 years ago

Used a Tailwind UI Template. Bullish.

评论 #34697408 未加载

pxoeover 2 years ago

that same laion that scraped the web for images, ignored their licenses and copyrights, and thought that'd do just fine? the one that chose to not implement systems that would detect licenses, and to not have license fields in their datasets? the one that knowingly points to copyrighted works in their datasets, yet also pretends like they're not doing anything at all? that same group?really trustworthy.

评论 #34656738 未加载

评论 #34655790 未加载

评论 #34661942 未加载

AstixAndBelixover 2 years ago

It's funny because the moment this is available to run on your machine you realize how useless it is. It might be fun to test its conversational limits, but only Siri can actually set an alarm or a timer or run a shortcut, while this thing can only blabber

评论 #34655639 未加载

评论 #34655558 未加载

评论 #34655800 未加载

评论 #34655620 未加载

评论 #34655954 未加载

consumer451over 2 years ago

I was very excited about Stable Diffusion, and I still am. A great yet relatively harmless contribution.LLMs however, not so much. The avenues of misuse are just too great.I started this whole thing somewhat railing against the un-openness of OpenAI. But once I began using ChatGPT, I realized that having centralized control of a tool like this in the hands of reasonable people is not the worst possible outcome for civilization.While I support FOSS in most realms, in some I do not. Reality has taught me to stop being rigidly religious about these things. Just because something is freely available does not magically make it "good."In the interest of curiosity and discussion, can someone give me some actual real-world examples of what a FOSS ChatGPT will enable that OpenAI's tool will not? And, please be specific, not just "no censorship." Please give examples of that censorship.

评论 #34655841 未加载

评论 #34656274 未加载

评论 #34656469 未加载

评论 #34659624 未加载