Who Wrote the Anti-Trump New York Times Op-Ed? Using Tidytext

114 pointsby ehudlaover 6 years ago

13 comments

minimaxirover 6 years ago

The trick behind NLP that's surprisingly under discussed is that you must use apples-to-apples comparisons between datasets. Using simple tweets as a proxy for style against a professionally-edited NYT op-ed is silly, but this analysis (along with the previous analyses) is transparent about that fact.

评论 #17943439 未加载

评论 #17943841 未加载

评论 #17943406 未加载

评论 #17945058 未加载

评论 #17943336 未加载

评论 #17943465 未加载

chxover 6 years ago

Yeah so a linguist already looked at this<a href="https://www.thedailybeast.com/this-linguist-helped-catch-the-unabomber-heres-his-take-on-the-new-york-times-resistance-op-ed" rel="nofollow">https://www.thedailybeast.com/this-linguist-helped-catch-the...</a>> “What’s been done so far is not scientific and simply silly,” he said.People. Don't get too drunk on your own kool-aid. Do I need to write an op ed that there are adults in the room :P ?

评论 #17944134 未加载

评论 #17943945 未加载

whatshisfaceover 6 years ago

It's worth noting that Sec. Mike Pompeo used to be the director of the CIA and would probably know someone who could tell him about NLP and how to avoid it. The difference between identifying Shakespeare and identifying present-day politicians is that the politicians alive today have a chance of knowing about technology used today!<a href="https://en.wikipedia.org/wiki/Mike_Pompeo" rel="nofollow">https://en.wikipedia.org/wiki/Mike_Pompeo</a>

评论 #17943528 未加载

评论 #17943338 未加载

mc32over 6 years ago

Is it also not somewhat equally likely someone who wanted to remain unknown to mask their own lexicon so as to make it harder to detect who the real author(s) were?If someone were to do something like this, would it not be in their interest to assume someone else's lexicon or, at least, coopt the lexicon of others in general to throw of the lexicographic 'scent'?In other words we can't only make the presumption that this is the author(s)' natural language.

评论 #17943240 未加载

评论 #17943200 未加载

vipulvedover 6 years ago

One would assume that the author, who decided to publish a time-insensitive piece anonymously, was smart enough to employ language to intentionally misdirect analytical methods.

评论 #17943662 未加载

评论 #17943670 未加载

sn41over 6 years ago

I have a suggestion, which others may have thought of: can you randomize the style using programs?1. You write an essay.2. [Word-level randomization] An algorithm looks at it, as a first cut, replaces uncommon words with common equivalents.3. [Sentence-level randomization] Changes each sentence to a randomly selected grammatically correct equivalent.4. [Paragraph-level randomization] Adds spurious repetitions, replace some paragraphs with automated summarized versions.i.e. destroy the rhetorical effect, but get the message across so as to defeat NLP-based identification. If the content of the message is important, and the final version is okay to the author, this might be safer for anonymous disclosures.

评论 #17944312 未加载

评论 #17944282 未加载

mcguireover 6 years ago

I wonder if there are any ethical issues in outing someone using these techniques.

评论 #17943925 未加载

评论 #17944450 未加载

评论 #17943534 未加载

williamsteinover 6 years ago

This situation reminds me a little of the Unabomber, who was finally caught entirely because somebody (his brother) simply recognized his writing style.

评论 #17943872 未加载

评论 #17943984 未加载

village-idiotover 6 years ago

I think if I planned to write a piece like this, I'd give an outline to a close confidant, such as my spouse, and have them write it. Assuming said confidant doesn't have a ton of public writing samples, this would frustrate any textual analysis.

评论 #17944240 未加载

wodenokotoover 6 years ago

I did a end-of-semester project on stylometry on my masters, and no-where in our literature review did we see anyone comparing TF-IDF's scores for authorship attribution.Other than some guy on twitter saying TF-IDF, on what grounds is this classifier chosen?Other mention that "This is about showing off the library". In that case, the authors should chose to show case their tools for a problem they are ment to solve."Check out my hammer, here's how to use it: First, you get some screws..."

ada1981over 6 years ago

I really wish some news outlet was circulating the headline “text analsysis shows Trump to be most likely writer of anonymous memo!”

aphextronover 6 years ago

The op-ed was a plant, personally approved by Trump. This has literally been his standard MO for decades: plant fake stories and weaponize the press for his own use. And it works. He effectively ended the entire Manafort/Cohen news cycle overnight.Think about the plausibility of this story for even a moment. There's a list of less than 10 people that could have possibly written the op-ed, and what it amounts to legally is a written admission of high treason, a felony punishable by death. Do you seriously think any reasonable person would ever do that and assume they would get away with it?The entire administration has been one massive psy-ops campaign against the American public. Don't listen to a single word they say. Just watch their actions, and react with your own judgement accordingly.

评论 #17943715 未加载

评论 #17944062 未加载

评论 #17943738 未加载

评论 #17943836 未加载

评论 #17943584 未加载

dborehamover 6 years ago

Unless "gutless anonymous" is a fool, they got someone else (e.g. NYT staffers) to write the actual text. Better to look at individuals who had recent physical contact with NYT senior level employees since they would certainly avoid telephone calls and email. Fairly likely the manifesto was written on a Remington or Underwood.

评论 #17943601 未加载

评论 #17943572 未加载

评论 #17944765 未加载

评论 #17943600 未加载