Better Language Models and Their Implications

426 pointsby yigitdemiragover 6 years ago

35 comments

cs702over 6 years ago

This kind of "blocking-and-tackling" work is important.The authors take a well-known architecture, the Transformer[a], configure it with a progressively larger number of parameter, train it to predict the next word conditioned on previous text, using a large dataset consisting of 40GB of text scraped from the Web, and test each trained model on a range of zero-shot transfer-learning tasks.Remarkably, the performance of a Transformer in the tested tasks improves log-linearly with the number of parameters, suggesting that even the largest model tested, with 1.5B parameters, still underfits 40GB of text.This is compelling evidence that we do NOT need new architectures, NOR new kinds of training objectives, NOR new theories, for better language modeling! We can get better language modeling simply by increasing model capacity (i.e., by adding more parameters to existing models), which becomes easier and simpler to do as hardware continues to improve over time.Great work.PS. In case it's not clear: I'm not saying we should suddenly stop searching for new, better ideas and architectures. That would be silly. Please don't attack a straw-man :-)[a] <a href="https://arxiv.org/abs/1706.03762" rel="nofollow">https://arxiv.org/abs/1706.03762</a>

评论 #19167134 未加载

评论 #19165727 未加载

评论 #19165779 未加载

评论 #19166335 未加载

评论 #19165732 未加载

restersover 6 years ago

While censoring the full data set seems in some way to support the rationale of the OpenAI charter, it also means that only state actors and very well-funded entities will be able to use the work to create models of the size necessary to do the impressive stuff in the write up.Based on the concerns, it would seem that restricting the capabilities only to state actors would have the opposite of the intended effect. Why not let thousands of amateur researchers, undergrads, etc., use the model to detect instances where the model was used to generate text, etc.?

评论 #19167605 未加载

评论 #19165441 未加载

评论 #19169354 未加载

rfdearbornover 6 years ago

Sure, not releasing the full trained model probably delays it, but sooner or later a bad actor will do their own scraping and train their own model and share it around and the genie will be out of the bottle. Then what?I think we need to be conducting AI research (and building software generally) under the assumption that all of it will eventually be repurposed by bad actors. How would our practices be different if we consistently and cautiously did this?Here's a thought experiment: how would the Manhattan project have been different if it were carried out in the open and its products were instantaneously and infinitely reproducible? What is the MAD equilibrium of AI research? I think the impact potential is similar even before AGI.

评论 #19168135 未加载

评论 #19168504 未加载

评论 #19169890 未加载

yongjikover 6 years ago

> ... some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”Man, the auto-generated text is hilarious. And uncannily good. Though I have to wonder if it's total random fluke or there's one among their 1.5 billion parameters that predict "likelihood of mythical bestiality in South America".

评论 #19167012 未加载

legatusover 6 years ago

I was honestly surprised by the quality of the generated text. While I can't say I've been following the state of the art in the last months, this seems like a pretty important step forward. Furthermore, at the end of the post they note that the samples are somewhat representative of their results. Maybe they should consider releasing a text file with some more (not hand-chosen) samples? Whatever the case, fantastic work, my congratulations to the authors.

评论 #19163970 未加载

评论 #19165072 未加载

评论 #19164318 未加载

restersover 6 years ago

Just as many pesticides mimic the hormonal and chemical signals of pests to drive certain behaviors that lead to eradication, this work mimics the linguistic signals of humans. I think viewing it metaphorically as the most sophisticated humanicide discovered to date is probably appropriate.Consider that conventional munitions make an effective pesticide but are not used due to their side effects. Instead, chemicals are used to destroy or mimic the perception and production of various signals so that populations of unwanted critters effectively self-destruct.Imagine a war fought with a weapon like this that left entire cities perfectly intact!# end hyperbole

评论 #19164473 未加载

评论 #19164566 未加载

评论 #19168122 未加载

评论 #19170248 未加载

songeaterover 6 years ago

Is anyone else troubled by them not releasing the source model/dataset/parameters here? Yes, the technology can be used for malicious means - but would argue that "DeepFaking" language is FAR less of a problem than "DeepFaking" video/photo/audio... which already occurs. Seems like they went back on their charter to share AI developments broadly ("not concentrate power") under the excuse of "safety."(These results look fire btw)Note: copied my comment from dupe thread

评论 #19167190 未加载

评论 #19170216 未加载

dzinkover 6 years ago

This tech can easily be used to flood humanity’s shared brain with auto-generated propaganda. Schizophrenia of the internet in a way. There is plenty of incentive with Google algorithms favoring number of words and relevant keywords in content for rankings - you could have NLP bots lifting junk sites to top results.To step ahead in that chess game, a detection tool for fake would be just training grounds for better GAN. Instead we may see a certifying authority that labels content as human generated, certified fact maybe? Wikipedia and reddit are not safe without fast automatic moderation either.Do you have a brainstorm or idea/prototype submission site where people can submit approaches to countering bad ai actors? An white/grey hat ai bounty program of sorts?

评论 #19169988 未加载

评论 #19168008 未加载

Cybioteover 6 years ago

It's becoming ever more certain that the transformer architecture is one of the largest contributions to AI (not merely machine learning, but AI), often beating LSTMs despite LSTMs being expressive enough to capture Turing Equivalence (at least in theory). Its main ideas are three: shorter paths help gradient flow, the training setup and the final key aspect, unhelpfully called self-attention. Self-attention is better thought of as a form of similarity gated key-value soft memory on which learning operations allows Transformers to learn non-trivial programs with contextual weights look-ups.I also notice reported tries, suggesting some level of curation. While this level of generation is undoubtedly impressive and a sign of non-trivial levels of understanding, the ability to project along arbitrary dimensions of similarity at a fine-grained level and to learn from text instruction is more useful than text generation. Although the unicorn story was a really fun read, better than many humans already, I doubt it could have gone on for much longer. It maintains a theme but not coherently or fluently (see especially the Kennedy nanotech and recycling examples, comparing the dis-fluency there versus the excellence of the Civil War report suggest at least some over-fitting). These relatively minor caveats aside, this is unambiguously an outstanding result.Winograd Schemas are the single metric to track if interested in understanding how language understanding is truly improving. OpenAI reports 71% and wrongly report the previous record as 63%. The current record here is at 65% <a href="https://gluebenchmark.com/leaderboard" rel="nofollow">https://gluebenchmark.com/leaderboard</a> though not fully comparable. Will OpenAI be submitting? Note that you can get to 60% using about 1-2 orders of magnitude less data and compute.It concerns me that results here are so far dependent on such large data and computation. However, based on several papers I've read, I do not believe this to be inherent even in transformers. I plan to do some experiments on this when I free up some bandwidth.If everyone is pulled in by the glamour of working for a well funded, prestigious operation then it should be no surprise that they do not consider paths which operate on several orders of magnitude less data and computational resources.We all should consider bringing about a group of researchers who swear to an austere computational life of a single GPU, no more than 4-8x average RAM and CPUs that do not cross 90 Watts. The Bicameral Order would be a good name for such a group.

评论 #19167936 未加载

lucidrainsover 6 years ago

Started a Google colab with the interactive text generation script. <a href="https://colab.research.google.com/drive/1da54684tFMjPbR5idbvoCyjOoEGwIVwV" rel="nofollow">https://colab.research.google.com/drive/1da54684tFMjPbR5idbv...</a>

评论 #19165446 未加载

评论 #19166425 未加载

评论 #19165202 未加载

评论 #19164798 未加载

gallerdudeover 6 years ago

These samples are freaky good. We're approaching some threshold very, very fast. I'm not sure what that threshold is, and whether or not crossing it is a good thing, but soon we'll be there.

评论 #19166386 未加载

评论 #19167294 未加载

评论 #19167133 未加载

optionover 6 years ago

This is very impressive. The decision to not release the model is questionable imho. There are labs, companies and state agents which have way more compute than OpenAI and therefore can do even better.Perhaps we need some kind of competition for detecting machine generated vs human generated content?

评论 #19164356 未加载

minimaxirover 6 years ago

The generated text sounds too good; is it possible that the model overfit the source material (especially since the n-previous-tokens value is infinite, while other approaches like char-rnns/textgenrnn use a fixed window length)? It's something I've encountered many times while working with text generation.

评论 #19167785 未加载

评论 #19167318 未加载

评论 #19164343 未加载

winterismuteover 6 years ago

Seems like in the near future the idea of "turing test" and the one of "not fake news" will eventually coincide...

lordnachoover 6 years ago

This is so crazy good, someone needs to do a Turing test by sending it to some unsuspecting publishers.I get the feeling that debatepocolypse is not far away. Every forum can now be spammed with reasonable sounding gibberish that humans will have to slog through.

评论 #19169845 未加载

评论 #19171870 未加载

评论 #19169332 未加载

avivoover 6 years ago

This was only a matter of time.For the DEFCON AI Village in August I talked about the implications of this sort of tech, and how that impacts how we release "exploit" code / think about "cognitive vulnerabilities": <a href="https://medium.com/@aviv/what-does-a-world-with-automated-social-engineering-look-like-79cd09b5a7b1" rel="nofollow">https://medium.com/@aviv/what-does-a-world-with-automated-so...</a>.If you are doing work in this space, either in ML research or related security, you need to be thinking about implications (also see e.g. <a href="https://maliciousaireport.com" rel="nofollow">https://maliciousaireport.com</a>).

评论 #19165523 未加载

评论 #19165838 未加载

plzHireMeImGoodover 6 years ago

Is this comparable to Google BERT (Bidirectional Encoder Representations from Transformers) ? Benchmarks are different. Can I use any of this models for other tasks no mentioned in the papers, something more than the "fine tuning"?

aaaaaaaaaaabover 6 years ago

In 10 years, content written by actual humans will be a premium niche, like tailored suits - reserved for the elites.The rest of us will be force-fed with machine-generated garbage.

评论 #19165641 未加载

评论 #19166111 未加载

评论 #19167989 未加载

zellynover 6 years ago

The generated Unicorn story has about the writing quality (in both senses: standard and feel) of fanfic

roywigginsover 6 years ago

I'd love to see what it would produce if you fed it the first sentence of your average Nigerian Prince scam. Fully-automated phishing- you could even automatically toss in a couple details about the recipient and let the AI riff on that for a bit.

paraschopraover 6 years ago

Anyone who’s done large scale model training like this, an you shed light on following questions:What is the process like? Do you prototype locally? How do you confidence that the only limitation to good results is more compute power and NOT the model architecture or applicability of deep learning to a particular task? At what point do you decide that shelling many tens of thousands is OK? How often do you do large scale training only to find non-impressive results and hence the money wasted?

评论 #19168562 未加载

foucover 6 years ago

I think this would be extremely useful when we can do the inverse. Basically - can we detect if someone's writing is nonsensical or not? Can we detect if someone that is producing many well written essays is adhering to reality or not? Are they subtly re-defining terms, using flawed examples, etc?The generated example of the biologists discovering a unicorn herd is too convincing on its own. It's only because it's so outlandish that we get the sense it's fictional.

评论 #19165572 未加载

antplsover 6 years ago

It's beautiful that the reduced model itself is only 175 lines of Python code, thanks to TensorFlow : <a href="https://github.com/openai/gpt-2/blob/master/src/model.py" rel="nofollow">https://github.com/openai/gpt-2/blob/master/src/model.py</a>

lotaezenwaover 6 years ago

this will definitely be reverse engineered and open-sourced.[0][0] <a href="https://en.wikipedia.org/wiki/Streisand_effect" rel="nofollow">https://en.wikipedia.org/wiki/Streisand_effect</a>

zawerfover 6 years ago

It's pretty interesting that their training set consists of "outbound links from Reddit which received at least 3 karma". There are definitely large subreddits which are flooded by highly voted fake news which you don't want to emulate (unless that's the goal).It also reminds me of a short fictional story which explores what would happen if an AI learn how to maximize reddit's sort by controversial score instead: <a href="https://slatestarcodex.com/2018/10/30/sort-by-controversial/" rel="nofollow">https://slatestarcodex.com/2018/10/30/sort-by-controversial/</a>Maybe that dystopian story is closer to reality than we thought?

teabee89over 6 years ago

The best part is "Scroll down for video" from <a href="https://blog.openai.com/better-language-models/#sample3" rel="nofollow">https://blog.openai.com/better-language-models/#sample3</a> :)

beefmanover 6 years ago

I wonder how this would do on the Hutter Prize (I doubt it would beat the current record but I'm curious what the result would be)<a href="http://prize.hutter1.net" rel="nofollow">http://prize.hutter1.net</a>

评论 #19164825 未加载

mrfusionover 6 years ago

I’m most impressed by its ability to answer questions about the text. Why can’t someone built something like this on top of Wikipedia? That would be amazing to ask Wikipedia any question you can think of.

评论 #19172360 未加载

kaffeeover 6 years ago

Not releasing the model? These people aren't scientists.edit: toned down a bit.

评论 #19167541 未加载

评论 #19167641 未加载

LHxBover 6 years ago

This reminds me of Dürrenmatt's "Die Physiker "(<a href="https://en.wikipedia.org/wiki/The_Physicists" rel="nofollow">https://en.wikipedia.org/wiki/The_Physicists</a>).While this has indeed very scary implications, one should be aware that if it's thinkable, eventually it will be thought (I'm paraphrasing here).

mark_l_watsonover 6 years ago

Wonderful results. I Don’t think I will experiment with the smaller available model, at least right now. I am still happy with BERT, especially for basically solving anaphora resolution (coreference of pronouns, etc.)

braindead_inover 6 years ago

Seems like magic! I wish I could do something similar with our chat support questions and answers. It would be nice to have something like this built-in.

grok2over 6 years ago

Nobody has asked about the animated text on the left at the top -- how is that done? That is more interesting to me!

评论 #19168408 未加载

dogcomplexover 6 years ago

Plot twist: all the comments on this thread were auto-generated.

mrfusionover 6 years ago

Can anyone do an eli5 of how this works?

35 comments

cs702over 6 years ago

评论 #19167134 未加载

评论 #19165727 未加载

评论 #19165779 未加载

评论 #19166335 未加载

评论 #19165732 未加载

restersover 6 years ago

评论 #19167605 未加载

评论 #19165441 未加载

评论 #19169354 未加载

rfdearbornover 6 years ago

评论 #19168135 未加载

评论 #19168504 未加载

评论 #19169890 未加载

yongjikover 6 years ago

评论 #19167012 未加载

legatusover 6 years ago

评论 #19163970 未加载

评论 #19165072 未加载

评论 #19164318 未加载

restersover 6 years ago

评论 #19164473 未加载

评论 #19164566 未加载

评论 #19168122 未加载

评论 #19170248 未加载

songeaterover 6 years ago

评论 #19167190 未加载

评论 #19170216 未加载

dzinkover 6 years ago

评论 #19169988 未加载

评论 #19168008 未加载

Cybioteover 6 years ago

评论 #19167936 未加载

lucidrainsover 6 years ago

评论 #19165446 未加载

评论 #19166425 未加载

评论 #19165202 未加载

评论 #19164798 未加载

gallerdudeover 6 years ago

These samples are freaky good. We're approaching some threshold very, very fast. I'm not sure what that threshold is, and whether or not crossing it is a good thing, but soon we'll be there.

评论 #19166386 未加载

评论 #19167294 未加载

评论 #19167133 未加载

optionover 6 years ago

评论 #19164356 未加载

minimaxirover 6 years ago

评论 #19167785 未加载

评论 #19167318 未加载

评论 #19164343 未加载

winterismuteover 6 years ago

Seems like in the near future the idea of "turing test" and the one of "not fake news" will eventually coincide...

lordnachoover 6 years ago

评论 #19169845 未加载

评论 #19171870 未加载

评论 #19169332 未加载

avivoover 6 years ago

评论 #19165523 未加载

评论 #19165838 未加载

plzHireMeImGoodover 6 years ago

aaaaaaaaaaabover 6 years ago

In 10 years, content written by actual humans will be a premium niche, like tailored suits - reserved for the elites.The rest of us will be force-fed with machine-generated garbage.

评论 #19165641 未加载

评论 #19166111 未加载

评论 #19167989 未加载

zellynover 6 years ago

The generated Unicorn story has about the writing quality (in both senses: standard and feel) of fanfic

roywigginsover 6 years ago

paraschopraover 6 years ago

评论 #19168562 未加载

foucover 6 years ago

评论 #19165572 未加载

antplsover 6 years ago

lotaezenwaover 6 years ago

this will definitely be reverse engineered and open-sourced.[0][0] <a href="https://en.wikipedia.org/wiki/Streisand_effect" rel="nofollow">https://en.wikipedia.org/wiki/Streisand_effect</a>

zawerfover 6 years ago

teabee89over 6 years ago

The best part is "Scroll down for video" from <a href="https://blog.openai.com/better-language-models/#sample3" rel="nofollow">https://blog.openai.com/better-language-models/#sample3</a> :)

beefmanover 6 years ago

评论 #19164825 未加载

mrfusionover 6 years ago

评论 #19172360 未加载

kaffeeover 6 years ago

Not releasing the model? These people aren't scientists.edit: toned down a bit.

评论 #19167541 未加载

评论 #19167641 未加载

LHxBover 6 years ago

mark_l_watsonover 6 years ago

braindead_inover 6 years ago

Seems like magic! I wish I could do something similar with our chat support questions and answers. It would be nice to have something like this built-in.

grok2over 6 years ago

Nobody has asked about the animated text on the left at the top -- how is that done? That is more interesting to me!