To provide you refuge from the inevitable deluge of sarcastic comments in this comment section, here is a genuine/sincere comment: I like cats.<p>> sarcasm is labelled by the author<p>They literally just searched out "/s". Clever. Though I'm guessing the "independently verified" entailed reading a lot of those comments.<p>Did they also read through the nonlabelled comments to catch any unlabelled sarcasm? (Guessing not since the pitch is of "self labelled sarcasm") wonder if that'll trip any usage up.
A professor of mine named John Haiman had many interesting thoughts on sarcasm. His book "Talk is Cheap", which I unfortunately can't find a PDF of online, is definitely recommended:<p><a href="https://www.amazon.com/Talk-Cheap-Alienation-Evolution-Language/dp/0195115252" rel="nofollow">https://www.amazon.com/Talk-Cheap-Alienation-Evolution-Langu...</a><p>I haven't read it in a few years, and my copy is at my parent's house in another country, but his writing always avoided the obtuse, impenetrable style that a lot of linguists are unfortunately guilty of. It is also approachable for anyone without a linguistics background.
A Kaggle user had the same idea two years ago: <a href="https://www.kaggle.com/smerity/d/reddit/reddit-comments-may-2015/finding-sarcasm/code" rel="nofollow">https://www.kaggle.com/smerity/d/reddit/reddit-comments-may-...</a><p>I had some fun exploring the data so I wrote a short blog post about it: <a href="https://davefernig.com/2015/10/19/the-lowest-form-of-wit-modelling-sarcasm-on-reddit/" rel="nofollow">https://davefernig.com/2015/10/19/the-lowest-form-of-wit-mod...</a>
Funnily, though I have a naturally sarcastic personality and frequently (and unintentionally) confuse people with my tone, I also have trouble sometimes persuading people that I <i>was not</i> being sarcastic when I say something plainly. I think it has to do with some statement I've made being so outside the norms of what they find acceptable that for them it is only understandable as sarcasm.<p>And this sort of thing happens both with written and oral communication, unless I really focus on providing facial and other body language clues as to my intent, which I find to be somewhat annoying. I am, after all, of Scandinavian extraction, and excessive emotional expression is not only frowned upon culturally, it has also been systematically bred out of my genetic code for dozens of generations.
> We collect a very large corpus, SARC-raw, with around 500-600 million total comments, of which 1.3 million are sarcastic.<p>So Reddit is 0.2% sarcastic. That sounds accurate.
If you combine this corpus with a compilation of Donald Trump's tweets, will it result in a matter-antimatter explosion of intentional sarcasm and unintentional irony?