TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

On the Dangers of Stochastic Parrots [pdf]

52 pointsby tmfiabout 4 years ago

8 comments

wmfabout 4 years ago
If Google execs believe that AIs trained on the public Web are the future of Google, this paper basically argues that those AIs, and by extension Google's future, are unethical and probably can't be fixed at any reasonable cost.
unbiasedmlabout 4 years ago
See also<p>&quot;The Slodderwetenschap (Sloppy Science) of Stochastic Parrots – A Plea for Science to NOT take the Route Advocated by Gebru and Bender&quot; by Michael Lissack.<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;ftp&#x2F;arxiv&#x2F;papers&#x2F;2101&#x2F;2101.10098.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;ftp&#x2F;arxiv&#x2F;papers&#x2F;2101&#x2F;2101.10098.pdf</a><p>I found this a reasonable critique of the original, despite apparent TOS violations by Lissack leading to his Twitter account being locked.
评论 #26310490 未加载
评论 #26309215 未加载
monkeybuttonabout 4 years ago
The paper mentions &quot;... similar to the ones used in GPT-2’s training data, i.e. documents linked to from Reddit [25], plus Wikipedia and a collection of books&quot;. Does anyone know what collection of books they are talking about?<p>I tried following the chain of references but ended up at a pay-walled source. Is it based on project gutenberg? Also, does Google train their models on the contents of all the books they scanned for Google Books or are they not allowed to because of copyright right issues?
评论 #26308869 未加载
评论 #26308890 未加载
peachfuzzabout 4 years ago
I don&#x27;t get it. The paper reads like 10 pages of opinion and casting aspersions on language models. No math. No graphs.
评论 #26442454 未加载
评论 #26309194 未加载
superbcarrotabout 4 years ago
From the authors<p><pre><code> Shmargaret Shmitchell shmargaret.shmitchell@gmail.com The Aether </code></pre> Is this some meta joke or a reference to anything?
评论 #26307632 未加载
评论 #26442384 未加载
tsimionescuabout 4 years ago
Apart from the external dangers described (social, environmental), which I&#x27;m sure many will disagree with on multiple grounds, the article in general raises some very good points about the internal dangers these models pose to the field of NLP itself:<p>&gt; The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.<p>&gt; However, from the perspective of work on language technology, it is far from clear that all of the effort being put into using large LMs to ‘beat’ tasks designed to test natural language understanding, and all of the effort to create new such tasks, once the existing ones have been bulldozed by the LMs, brings us any closer to long-term goals of general language understanding systems. If a large LM, endowed with hundreds of billions of parameters and trained on a very large dataset, can manipulate linguistic form well enough to cheat its way through tests meant to require language understanding, have we learned anything of value about how to build machine language understanding or have we been led down the garden path?
评论 #26309580 未加载
评论 #26309507 未加载
oh_sighabout 4 years ago
This is the paper surrounding Timnit&#x27;s &quot;departure&quot; from Google.<p>If you&#x27;re on Timnit&#x27;s side, &quot;departure&quot; means &quot;firing&quot;, and the paper is the reason she was fired.<p>If you&#x27;re on Google&#x27;s side, &quot;departure&quot; means &quot;mutually-agreeable resignation&quot;, with Timnit&#x27;s melodramatic and unprofessional response to normal feedback.<p>Personally, I don&#x27;t see anything in this paper that implicates Google or would be reasonable for Google to try to suppress, so I&#x27;m falling into the camp of trusting Google&#x27;s side of the story. But who knows?
评论 #26309821 未加载
评论 #26306919 未加载
bryanrasmussenabout 4 years ago
reading it I thought - if language models can be too big that could be a problem for Google given that at least one of their major competitive advantages is being able to have the biggest language models there are.<p>Although I don&#x27;t really know if that&#x27;s so (about the competitive advantage), it certainly seems like it is something Google might think from what I remember about earlier Google arguments about automated translation.