GPT-2: 6-Month Follow-Up

160 点作者 xcodevn超过 5 年前

12 条评论

gambler超过 5 年前

"Cornell University is studying human susceptibility to digital disinformation generated by language models.""The Middlebury Institute of International Studies Center on Terrorism, Extremism, and Counterterrorism (CTEC) is exploring how GPT-2 could be misused by terrorists and extremists online.""The University of Oregon is developing a series of “bias probes” to analyze bias within GPT-2."But apparently no university studies the social and economic impact of using terabytes of public data to train algorithms that for all practical reasons end up being inaccessible to an average person.If things go on the way they're going right now, in 20 years millions of people will be "mechanical turked". Most of information processing tools will be mediated exclusively through companies like Google and Amazon. They will be less like normal tools (e.g. word processors) and more like systems you have to be a part of. Can you imagine the levels of inequality involved? The hyper-centralization of power? This is the foremost challenge presented by AI, not some hypothetical nonsense involving terrorists using a text generator.And it's not like there aren't any solutions. Douglas Engelbart, for example, pointed out a great way of introducing technology into society without screwing most of the society over:<a href="http://dougengelbart.org/content/view/138" rel="nofollow">http://dougengelbart.org/content/view/138</a>We kind of followed his vision for a while, with good results, but AI seems to be going in an entirely different direction.

评论 #20752555 未加载

评论 #20754104 未加载

评论 #20753948 未加载

评论 #20754042 未加载

评论 #20752121 未加载

revel超过 5 年前

The OpenAI approach to managing the release of the larger dataset strikes me as totally flawed and upside down. The biggest concern the team seem to have is that the fully trained GPT2 model will be used to spread propaganda and misinformation. They also imply that the biggest hurdle to training a similar model is money needed to pay for the training resources.The problem with this approach is that the users most likely to be malicious users of GPT2 are state actors. China, for example, already spends millions on an immense propaganda factory. Money is not a serious obstacle for a state. Given that other research entities are, by the sound of things, already far along with development of similar models it seems unlikely that China and the US don't already have functional models internally.On the other hand, legitimate business and research is clearly hamstrung by withholding the full model. What we have is the maximum degree of inconvenience and the minimum degree of security. It feels almost perfectly analogous to ban on liquids in airports. The motivation for that ban was that existing security measures couldn't detect liquids, but simply announcing a ban was to be enforced didn't change the fact that liquids were undetectable. Instead millions of travelers were pointlessly inconvenienced at great cost.Release the kraken already!

评论 #20754077 未加载

评论 #20753962 未加载

评论 #20753834 未加载

minimaxir超过 5 年前

For finetuning GPT-2 on custom text, my gpt-2-simple package (<a href="https://github.com/minimaxir/gpt-2-simple" rel="nofollow">https://github.com/minimaxir/gpt-2-simple</a>) gets close to going OOM when finetuning the 345M model, even on a 16GB VRAM server GPU. Doubling the size of the model with the 774M model might cause it to not work at all, so I’ll need to test.Of course, the default output from the model might be sufficient, although it’ll take twice as long to generate text compared to the 345M which is slow even on a GPU.How exactly the large GPT-2 models are deployed is a mystery I really wish was open-sourced more.

评论 #20752765 未加载

评论 #20749230 未加载

评论 #20750852 未加载

评论 #20753883 未加载

The_rationalist超过 5 年前

<rant> Are there any real use case for GPT-2? Does it solve any problem? I've read almost all state of the art leaderboards of all Nlp tasks of paperswithcode.com and truth is except text generation, openAI has not one state of the art, they are not even visible in leaderboards. OpenAI is maybe the AI research center with the biggest funding and comparatively to other well known (Microsoft, Facebook, Google or even zalando..) they are the ones with the least results.From my observations most SOTAs come from chineses researchers by far, followed by deepmind.BTW isn't that a sad truth that not even one of all major AI actors has a draft of an AGI architecture, something comparable to CYC or opencog. <a href="https://wiki.opencog.org/w/CogPrime_Overview" rel="nofollow">https://wiki.opencog.org/w/CogPrime_Overview</a>Two other observations I would like to share: Many important NLP tasks have almost nobody publicly working on them it seems, on paperswithcode.com or NLP-progress (from github) some tasks have only one or two papers... And many others have not evolved since 2016. Most of the time it seems trivial to beat the old state of the art, just use BERT or XLnet on a task where nobody applied it before and hop, free state of the art for you! Yet researchers don't seems to chase those low hanging, high returns fruits. Also researchers seems to work a lot in isolation, many new generic improvements like new optimizers (RAdam for example) and new activation functions (Swish) allow to beat most of older state of the art on almost all task just by using them. Yet researchers will take years before using them because of an absurd inertia. Also unlike an open source program, BERT and XLnet have very low response and activity on github despite major open issues... </rant>

评论 #20749860 未加载

评论 #20750902 未加载

评论 #20750233 未加载

评论 #20753346 未加载

评论 #20749680 未加载

gambler超过 5 年前

Hopefully someone will make a working demo of it, like Adam King did for 345M. People should be able to experiment with this stuff without relying on the hype of press releases:<a href="https://medium.com/@VictorBanev/interrogating-gpt-2-345m-aaff8dcc516d" rel="nofollow">https://medium.com/@VictorBanev/interrogating-gpt-2-345m-aaf...</a>Not sure why open AI doesn't do this themselves. That fully aligns with their stated mission.

评论 #20751512 未加载

评论 #20750298 未加载

评论 #20751136 未加载

评论 #20756945 未加载

zitterbewegung超过 5 年前

I was able to take all of Donald Trumps tweets and using GPT2 to make a program that would mimic his tweets.I found that it might be very effective. I have the test at<a href="https://docs.google.com/forms/d/1p7tlobl5y5plBCu_enK4KawR7B8_4Yyb-wCUh6vr9A0/edit" rel="nofollow">https://docs.google.com/forms/d/1p7tlobl5y5plBCu_enK4KawR7B8...</a>I got the information from trumptwitterarchive.comI also explored creating a system that could recognize fake tweets from real ones and I believe I got 94% accuracy. It was a Bayes classifier but I think I have to double check my work.

评论 #20751068 未加载

lucidrains超过 5 年前

Hmm, no mention of Megatron in their timeline? <a href="https://nv-adlr.github.io/MegatronLM" rel="nofollow">https://nv-adlr.github.io/MegatronLM</a>

评论 #20753107 未加载

rovyko超过 5 年前

>As part of our staged release strategy, our current plan is to release the 1558M parameter model in a few months, but it’s plausible that findings from a partner, or malicious usage of our 774M model, could change this.This seems naive but I think it's a misdirection. Of course the model will have malicious users. Propaganda teams started testing its integration as soon as it was released. It's likely that OpenAI is counting on this for insights into HOW the model can be used maliciously. It's also possible that the model results have inherent trackable markers and OpenAI can later say that X% of social media posts were made using this model.So what are the positive applications, aside from prettifying data like sports and weather reports?Even with Skyrim's 800+ books, you frequently ran into the same book. Imagine libraries filled with plausible text that hides nuggets of lore seeded by developers. Along with more realistic text-to-speech this can allow games to support a large diversity of NPCs that have true radiant dialogue and sound more realistic than "I saw a mudcrab the other day".With some modifications, I think models like this can outweigh even their nefarious applications:Defense against text decomposition analysis. The model can be used to obfuscate writing patterns that can reveal a person's identity, either by randomizing form or standardizing it. Take your post and run it through the formatter to get the same idea and intent, but in a style that can't be traced to your other writing. Or you reform it into style of Ernest Hemmingway, like thousands of others.Realtime plausible deniability encryption. Messages in a monitored chat can look like mundane conversation but contain encrypted messages. This would require the model accept seeds and work partially in reverse to diff two sets of text to reveal the hidden message.In it's current form it doesn't look like it can do any of those things, but there's the potential.

baalimago超过 5 年前

Even if GPT-2 were released, very very few would have the hardware to run it because of gpu ram running out (and doing some sort of load-unload system would make training times unfeasibly long). And those who have the hardware to run it, has probably already made a version of their own or reasons not to. So I'm wondering if this GPT-2 hype is a genuine concern of openai, or if it's mostly a PR flex to say 'Look at us, we made a good model!'.As an example, look here by Nvidia <a href="https://devblogs.nvidia.com/training-bert-with-gpus/" rel="nofollow">https://devblogs.nvidia.com/training-bert-with-gpus/</a> who made GPT-2 8B, which is ~5 times as large as GPT-2.

wyldfire超过 5 年前

Are there any applications for the GPT-2 models beyond text synthesis? Inference, question-answering, NER detection/disambiguation, anything like this?

评论 #20749275 未加载

评论 #20749243 未加载

评论 #20751168 未加载

brentsch超过 5 年前

I'm curious about the "fine-tuning based detection" mentioned in the report ("Fine-tunes a language model to 'detect itself'... over a range of available settings"). Does anyone know good articles/papers (or have an off-the-top tl;dr) to get a high-level grasp of "self-detection" for generative models?

评论 #20753653 未加载

lxe超过 5 年前

Anyone wired a "talktotransformer"-style system to this one yet? Would like to see how it works without going through the steps of setting it up.EDIT: Looks like <a href="https://talktotransformer.com/" rel="nofollow">https://talktotransformer.com/</a> already uses the 774M one!

评论 #20752488 未加载

12 条评论

gambler超过 5 年前

评论 #20752555 未加载

评论 #20754104 未加载

评论 #20753948 未加载

评论 #20754042 未加载

评论 #20752121 未加载

revel超过 5 年前

评论 #20754077 未加载

评论 #20753962 未加载

评论 #20753834 未加载

minimaxir超过 5 年前

评论 #20752765 未加载

评论 #20749230 未加载

评论 #20750852 未加载

评论 #20753883 未加载

The_rationalist超过 5 年前

评论 #20749860 未加载

评论 #20750902 未加载

评论 #20750233 未加载

评论 #20753346 未加载

评论 #20749680 未加载

gambler超过 5 年前

评论 #20751512 未加载

评论 #20750298 未加载

评论 #20751136 未加载

评论 #20756945 未加载

zitterbewegung超过 5 年前

评论 #20751068 未加载

lucidrains超过 5 年前

Hmm, no mention of Megatron in their timeline? <a href="https://nv-adlr.github.io/MegatronLM" rel="nofollow">https://nv-adlr.github.io/MegatronLM</a>

评论 #20753107 未加载

rovyko超过 5 年前

baalimago超过 5 年前

wyldfire超过 5 年前

Are there any applications for the GPT-2 models beyond text synthesis? Inference, question-answering, NER detection/disambiguation, anything like this?