This kind of "blocking-and-tackling" work is important.<p>The authors take a well-known architecture, the Transformer[a], configure it with a progressively larger number of parameter, train it to predict the next word conditioned on previous text, using a large dataset consisting of 40GB of text scraped from the Web, and test each trained model on a range of zero-shot transfer-learning tasks.<p>Remarkably, the performance of a Transformer in the tested tasks improves <i>log-linearly</i> with the number of parameters, suggesting that even the largest model tested, with 1.5B parameters, still <i>underfits</i> 40GB of text.<p>This is <i>compelling evidence</i> that we do NOT need new architectures, NOR new kinds of training objectives, NOR new theories, for better language modeling! We can get better language modeling simply by increasing model capacity (i.e., by adding more parameters to existing models), which becomes easier and simpler to do as hardware continues to improve over time.<p>Great work.<p>PS. In case it's not clear: I'm not saying we should suddenly stop searching for new, better ideas and architectures. That would be silly. Please don't attack a straw-man :-)<p>[a] <a href="https://arxiv.org/abs/1706.03762" rel="nofollow">https://arxiv.org/abs/1706.03762</a>
While censoring the full data set seems in some way to support the rationale of the OpenAI charter, it also means that only state actors and very well-funded entities will be able to use the work to create models of the size necessary to do the impressive stuff in the write up.<p>Based on the concerns, it would seem that restricting the capabilities only to state actors would have the opposite of the intended effect. Why not let thousands of amateur researchers, undergrads, etc., use the model to detect instances where the model was used to generate text, etc.?
Sure, not releasing the full trained model probably delays it, but sooner or later a bad actor will do their own scraping and train their own model and share it around and the genie will be out of the bottle. Then what?<p>I think we need to be conducting AI research (and building software generally) under the assumption that all of it will eventually be repurposed by bad actors. How would our practices be different if we consistently and cautiously did this?<p>Here's a thought experiment: how would the Manhattan project have been different if it were carried out in the open and its products were instantaneously and infinitely reproducible? What is the MAD equilibrium of AI research? I think the impact potential is similar even before AGI.
> ... some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”<p>Man, the auto-generated text is hilarious. And uncannily good. Though I have to wonder if it's total random fluke or there's one among their 1.5 billion parameters that predict "likelihood of mythical bestiality in South America".
I was honestly surprised by the quality of the generated text. While I can't say I've been following the state of the art in the last months, this seems like a pretty important step forward. Furthermore, at the end of the post they note that the samples are somewhat representative of their results. Maybe they should consider releasing a text file with some more (not hand-chosen) samples? Whatever the case, fantastic work, my congratulations to the authors.
Just as many pesticides mimic the hormonal and chemical signals of pests to drive certain behaviors that lead to eradication, this work mimics the linguistic signals of humans. I think viewing it metaphorically as the most sophisticated humanicide discovered to date is probably appropriate.<p>Consider that conventional munitions make an effective pesticide but are not used due to their side effects. Instead, chemicals are used to destroy or mimic the perception and production of various signals so that populations of unwanted critters effectively self-destruct.<p>Imagine a war fought with a weapon like this that left entire cities perfectly intact!<p># end hyperbole
Is anyone else troubled by them not releasing the source model/dataset/parameters here? Yes, the technology can be used for malicious means - but would argue that "DeepFaking" language is FAR less of a problem than "DeepFaking" video/photo/audio... which already occurs. Seems like they went back on their charter to share AI developments broadly ("not concentrate power") under the excuse of "safety."<p>(These results look fire btw)<p>Note: copied my comment from dupe thread
This tech can easily be used to flood humanity’s shared brain with auto-generated propaganda. Schizophrenia of the internet in a way. There is plenty of incentive with Google algorithms favoring number of words and relevant keywords in content for rankings - you could have NLP bots lifting junk sites to top results.<p>To step ahead in that chess game, a detection tool for fake would be just training grounds for better GAN. Instead we may see a certifying authority that labels content as human generated, certified fact maybe? Wikipedia and reddit are not safe without fast automatic moderation either.<p>Do you have a brainstorm or idea/prototype submission site where people can submit approaches to countering bad ai actors? An white/grey hat ai bounty program of sorts?
It's becoming ever more certain that the transformer architecture is one of the largest contributions to AI (not merely machine learning, but AI), often beating LSTMs despite LSTMs being expressive enough to capture Turing Equivalence (at least in theory). Its main ideas are three: shorter paths help gradient flow, the training setup and the final key aspect, unhelpfully called self-attention. Self-attention is better thought of as a form of similarity gated key-value soft memory on which learning operations allows Transformers to learn non-trivial programs with contextual weights look-ups.<p>I also notice reported tries, suggesting some level of curation. While this level of generation is undoubtedly impressive and a sign of non-trivial levels of understanding, the ability to project along arbitrary dimensions of similarity at a fine-grained level and to learn from text instruction is more useful than text generation. Although the unicorn story was a really fun read, better than many humans already, I doubt it could have gone on for much longer. It maintains a theme but not coherently or fluently (see especially the Kennedy nanotech and recycling examples, <i>comparing the dis-fluency there versus the excellence of the Civil War report suggest at least some over-fitting</i>). These relatively minor caveats aside, this is unambiguously an outstanding result.<p>Winograd Schemas are the single metric to track if interested in understanding how language understanding is truly improving. OpenAI reports 71% and <i>wrongly</i> report the previous record as 63%. The current record here is at 65% <a href="https://gluebenchmark.com/leaderboard" rel="nofollow">https://gluebenchmark.com/leaderboard</a> though not fully comparable. Will OpenAI be submitting? Note that you can get to 60% using about 1-2 orders of magnitude less data and compute.<p>It concerns me that results here are so far dependent on such large data and computation. However, based on several papers I've read, I do not believe this to be inherent even in transformers. I plan to do some experiments on this when I free up some bandwidth.<p>If everyone is pulled in by the glamour of working for a well funded, prestigious operation then it should be no surprise that they do not consider paths which operate on several orders of magnitude less data and computational resources.<p>We all should consider bringing about a group of researchers who swear to an austere computational life of a single GPU, no more than 4-8x average RAM and CPUs that do not cross 90 Watts. <i>The Bicameral Order</i> would be a good name for such a group.
Started a Google colab with the interactive text generation script.
<a href="https://colab.research.google.com/drive/1da54684tFMjPbR5idbvoCyjOoEGwIVwV" rel="nofollow">https://colab.research.google.com/drive/1da54684tFMjPbR5idbv...</a>
These samples are <i>freaky</i> good. We're approaching some threshold very, very fast. I'm not sure what that threshold is, and whether or not crossing it is a good thing, but soon we'll be there.
This is very impressive. The decision to not release the model is questionable imho. There are labs, companies and state agents which have way more compute than OpenAI and therefore can do even better.<p>Perhaps we need some kind of competition for detecting machine generated vs human generated content?
The generated text sounds <i>too</i> good; is it possible that the model overfit the source material (especially since the n-previous-tokens value is infinite, while other approaches like char-rnns/textgenrnn use a fixed window length)? It's something I've encountered many times while working with text generation.
This is so crazy good, someone needs to do a Turing test by sending it to some unsuspecting publishers.<p>I get the feeling that debatepocolypse is not far away. Every forum can now be spammed with reasonable sounding gibberish that humans will have to slog through.
This was only a matter of time.<p>For the DEFCON AI Village in August I talked about the implications of this sort of tech, and how that impacts how we release "exploit" code / think about "cognitive vulnerabilities": <a href="https://medium.com/@aviv/what-does-a-world-with-automated-social-engineering-look-like-79cd09b5a7b1" rel="nofollow">https://medium.com/@aviv/what-does-a-world-with-automated-so...</a>.<p>If you are doing work in this space, either in ML research or related security, you <i>need</i> to be thinking about implications (also see e.g. <a href="https://maliciousaireport.com" rel="nofollow">https://maliciousaireport.com</a>).
Is this comparable to Google BERT (Bidirectional Encoder Representations from Transformers) ? Benchmarks are different. Can I use any of this models for other tasks no mentioned in the papers, something more than the "fine tuning"?
In 10 years, content written by actual humans will be a premium niche, like tailored suits - reserved for the elites.<p>The rest of us will be force-fed with machine-generated garbage.
I'd love to see what it would produce if you fed it the first sentence of your average Nigerian Prince scam. Fully-automated phishing- you could even automatically toss in a couple details about the recipient and let the AI riff on that for a bit.
Anyone who’s done large scale model training like this, an you shed light on following questions:<p>What is the process like? Do you prototype locally? How do you confidence that the only limitation to good results is more compute power and NOT the model architecture or applicability of deep learning to a particular task? At what point do you decide that shelling many tens of thousands is OK? How often do you do large scale training only to find non-impressive results and hence the money wasted?
I think this would be extremely useful when we can do the inverse. Basically - can we detect if someone's writing is nonsensical or not? Can we detect if someone that is producing many well written essays is adhering to reality or not? Are they subtly re-defining terms, using flawed examples, etc?<p>The generated example of the biologists discovering a unicorn herd is too convincing on its own. It's only because it's so outlandish that we get the sense it's fictional.
It's beautiful that the reduced model itself is only 175 lines of Python code, thanks to TensorFlow : <a href="https://github.com/openai/gpt-2/blob/master/src/model.py" rel="nofollow">https://github.com/openai/gpt-2/blob/master/src/model.py</a>
this will definitely be reverse engineered and open-sourced.[0]<p>[0] <a href="https://en.wikipedia.org/wiki/Streisand_effect" rel="nofollow">https://en.wikipedia.org/wiki/Streisand_effect</a>
It's pretty interesting that their training set consists of "outbound links from Reddit which received at least 3 karma". There are definitely large subreddits which are flooded by highly voted fake news which you don't want to emulate (unless that's the goal).<p>It also reminds me of a short fictional story which explores what would happen if an AI learn how to maximize reddit's sort by controversial score instead: <a href="https://slatestarcodex.com/2018/10/30/sort-by-controversial/" rel="nofollow">https://slatestarcodex.com/2018/10/30/sort-by-controversial/</a><p>Maybe that dystopian story is closer to reality than we thought?
The best part is "Scroll down for video" from <a href="https://blog.openai.com/better-language-models/#sample3" rel="nofollow">https://blog.openai.com/better-language-models/#sample3</a> :)
I wonder how this would do on the Hutter Prize (I doubt it would beat the current record but I'm curious what the result would be)<p><a href="http://prize.hutter1.net" rel="nofollow">http://prize.hutter1.net</a>
I’m most impressed by its ability to answer questions about the text. Why can’t someone built something like this on top of Wikipedia? That would be amazing to ask Wikipedia any question you can think of.
This reminds me of Dürrenmatt's "Die Physiker "(<a href="https://en.wikipedia.org/wiki/The_Physicists" rel="nofollow">https://en.wikipedia.org/wiki/The_Physicists</a>).<p>While this has indeed very scary implications, one should be aware that if it's thinkable, eventually it will be thought (I'm paraphrasing here).
Wonderful results. I Don’t think I will experiment with the smaller available model, at least right now. I am still happy with BERT, especially for basically solving anaphora resolution (coreference of pronouns, etc.)
Seems like magic! I wish I could do something similar with our chat support questions and answers. It would be nice to have something like this built-in.