TechEcho

8 comments

wmfabout 4 years ago

If Google execs believe that AIs trained on the public Web are the future of Google, this paper basically argues that those AIs, and by extension Google's future, are unethical and probably can't be fixed at any reasonable cost.

unbiasedmlabout 4 years ago

See also"The Slodderwetenschap (Sloppy Science) of Stochastic Parrots – A Plea for Science to NOT take the Route Advocated by Gebru and Bender" by Michael Lissack.<a href="https://arxiv.org/ftp/arxiv/papers/2101/2101.10098.pdf" rel="nofollow">https://arxiv.org/ftp/arxiv/papers/2101/2101.10098.pdf</a>I found this a reasonable critique of the original, despite apparent TOS violations by Lissack leading to his Twitter account being locked.

评论 #26310490 未加载

评论 #26309215 未加载

monkeybuttonabout 4 years ago

The paper mentions "... similar to the ones used in GPT-2’s training data, i.e. documents linked to from Reddit [25], plus Wikipedia and a collection of books". Does anyone know what collection of books they are talking about?I tried following the chain of references but ended up at a pay-walled source. Is it based on project gutenberg? Also, does Google train their models on the contents of all the books they scanned for Google Books or are they not allowed to because of copyright right issues?

评论 #26308869 未加载

评论 #26308890 未加载

peachfuzzabout 4 years ago

I don't get it. The paper reads like 10 pages of opinion and casting aspersions on language models. No math. No graphs.

评论 #26442454 未加载

评论 #26309194 未加载

superbcarrotabout 4 years ago

From the authors<pre><code> Shmargaret Shmitchell shmargaret.shmitchell@gmail.com The Aether </code></pre> Is this some meta joke or a reference to anything?

评论 #26307632 未加载

评论 #26442384 未加载

tsimionescuabout 4 years ago

Apart from the external dangers described (social, environmental), which I'm sure many will disagree with on multiple grounds, the article in general raises some very good points about the internal dangers these models pose to the field of NLP itself:> The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.> However, from the perspective of work on language technology, it is far from clear that all of the effort being put into using large LMs to ‘beat’ tasks designed to test natural language understanding, and all of the effort to create new such tasks, once the existing ones have been bulldozed by the LMs, brings us any closer to long-term goals of general language understanding systems. If a large LM, endowed with hundreds of billions of parameters and trained on a very large dataset, can manipulate linguistic form well enough to cheat its way through tests meant to require language understanding, have we learned anything of value about how to build machine language understanding or have we been led down the garden path?

评论 #26309580 未加载

评论 #26309507 未加载

oh_sighabout 4 years ago

This is the paper surrounding Timnit's "departure" from Google.If you're on Timnit's side, "departure" means "firing", and the paper is the reason she was fired.If you're on Google's side, "departure" means "mutually-agreeable resignation", with Timnit's melodramatic and unprofessional response to normal feedback.Personally, I don't see anything in this paper that implicates Google or would be reasonable for Google to try to suppress, so I'm falling into the camp of trusting Google's side of the story. But who knows?

评论 #26309821 未加载

评论 #26306919 未加载

bryanrasmussenabout 4 years ago

reading it I thought - if language models can be too big that could be a problem for Google given that at least one of their major competitive advantages is being able to have the biggest language models there are.Although I don't really know if that's so (about the competitive advantage), it certainly seems like it is something Google might think from what I remember about earlier Google arguments about automated translation.

8 comments

wmfabout 4 years ago

unbiasedmlabout 4 years ago

评论 #26310490 未加载

评论 #26309215 未加载

monkeybuttonabout 4 years ago

评论 #26308869 未加载

评论 #26308890 未加载

peachfuzzabout 4 years ago

I don't get it. The paper reads like 10 pages of opinion and casting aspersions on language models. No math. No graphs.

评论 #26442454 未加载

评论 #26309194 未加载

superbcarrotabout 4 years ago

From the authors<pre><code> Shmargaret Shmitchell shmargaret.shmitchell@gmail.com The Aether </code></pre> Is this some meta joke or a reference to anything?

On the Dangers of Stochastic Parrots [pdf]

8 comments

On the Dangers of Stochastic Parrots [pdf]

8 comments