TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A Large Self-Annotated Corpus for Sarcasm

105 pointsby blopeurabout 8 years ago

11 comments

nebabyteabout 8 years ago
To provide you refuge from the inevitable deluge of sarcastic comments in this comment section, here is a genuine&#x2F;sincere comment: I like cats.<p>&gt; sarcasm is labelled by the author<p>They literally just searched out &quot;&#x2F;s&quot;. Clever. Though I&#x27;m guessing the &quot;independently verified&quot; entailed reading a lot of those comments.<p>Did they also read through the nonlabelled comments to catch any unlabelled sarcasm? (Guessing not since the pitch is of &quot;self labelled sarcasm&quot;) wonder if that&#x27;ll trip any usage up.
评论 #14226613 未加载
评论 #14227138 未加载
3131sabout 8 years ago
A professor of mine named John Haiman had many interesting thoughts on sarcasm. His book &quot;Talk is Cheap&quot;, which I unfortunately can&#x27;t find a PDF of online, is definitely recommended:<p><a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Talk-Cheap-Alienation-Evolution-Language&#x2F;dp&#x2F;0195115252" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Talk-Cheap-Alienation-Evolution-Langu...</a><p>I haven&#x27;t read it in a few years, and my copy is at my parent&#x27;s house in another country, but his writing always avoided the obtuse, impenetrable style that a lot of linguists are unfortunately guilty of. It is also approachable for anyone without a linguistics background.
评论 #14228132 未加载
dlkfabout 8 years ago
A Kaggle user had the same idea two years ago: <a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;smerity&#x2F;d&#x2F;reddit&#x2F;reddit-comments-may-2015&#x2F;finding-sarcasm&#x2F;code" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;smerity&#x2F;d&#x2F;reddit&#x2F;reddit-comments-may-...</a><p>I had some fun exploring the data so I wrote a short blog post about it: <a href="https:&#x2F;&#x2F;davefernig.com&#x2F;2015&#x2F;10&#x2F;19&#x2F;the-lowest-form-of-wit-modelling-sarcasm-on-reddit&#x2F;" rel="nofollow">https:&#x2F;&#x2F;davefernig.com&#x2F;2015&#x2F;10&#x2F;19&#x2F;the-lowest-form-of-wit-mod...</a>
sverigeabout 8 years ago
Funnily, though I have a naturally sarcastic personality and frequently (and unintentionally) confuse people with my tone, I also have trouble sometimes persuading people that I <i>was not</i> being sarcastic when I say something plainly. I think it has to do with some statement I&#x27;ve made being so outside the norms of what they find acceptable that for them it is only understandable as sarcasm.<p>And this sort of thing happens both with written and oral communication, unless I really focus on providing facial and other body language clues as to my intent, which I find to be somewhat annoying. I am, after all, of Scandinavian extraction, and excessive emotional expression is not only frowned upon culturally, it has also been systematically bred out of my genetic code for dozens of generations.
评论 #14228952 未加载
gavinpcabout 8 years ago
&gt; We collect a very large corpus, SARC-raw, with around 500-600 million total comments, of which 1.3 million are sarcastic.<p>So Reddit is 0.2% sarcastic. That sounds accurate.
thepropabout 8 years ago
Has anyone used this to build a Sarcasm bot? I desperately need this to handle all my Twitter &amp; Facebook replies.
评论 #14227472 未加载
sparkzillaabout 8 years ago
Yeah, that&#x27;ll be really useful.
mrcactu5about 8 years ago
am I walking into something here? I am concerned about regional bias. which dialect of English is being spoken?
basicplus2about 8 years ago
Is this Corpus for Sarcasm real or is this report of a corpus of sarcasm sarcasm?
评论 #14226091 未加载
psycabout 8 years ago
Neat. HNers could train themselves on it.
评论 #14226340 未加载
评论 #14227734 未加载
pavlovabout 8 years ago
If you combine this corpus with a compilation of Donald Trump&#x27;s tweets, will it result in a matter-antimatter explosion of intentional sarcasm and unintentional irony?