The trick behind NLP that's surprisingly under discussed is that you <i>must</i> use apples-to-apples comparisons between datasets. Using simple tweets as a proxy for style against a professionally-edited NYT op-ed is silly, but this analysis (along with the previous analyses) is transparent about that fact.
Yeah so a linguist already looked at this<p><a href="https://www.thedailybeast.com/this-linguist-helped-catch-the-unabomber-heres-his-take-on-the-new-york-times-resistance-op-ed" rel="nofollow">https://www.thedailybeast.com/this-linguist-helped-catch-the...</a><p>>
“What’s been done so far is not scientific and simply silly,” he said.<p>People. Don't get too drunk on your own kool-aid. Do I need to write an op ed that there are adults in the room :P ?
It's worth noting that Sec. Mike Pompeo used to be the director of the CIA and would probably know someone who could tell him about NLP and how to avoid it. The difference between identifying Shakespeare and identifying present-day politicians is that the politicians alive today have a chance of knowing about technology used today!<p><a href="https://en.wikipedia.org/wiki/Mike_Pompeo" rel="nofollow">https://en.wikipedia.org/wiki/Mike_Pompeo</a>
Is it also not somewhat equally likely someone who wanted to remain unknown to mask their own lexicon so as to make it harder to detect who the real author(s) were?<p>If someone were to do something like this, would it not be in their interest to assume someone else's lexicon or, at least, coopt the lexicon of others in general to throw of the lexicographic 'scent'?<p>In other words we can't only make the presumption that this is the author(s)' natural language.
One would assume that the author, who decided to publish a time-insensitive piece anonymously, was smart enough to employ language to intentionally misdirect analytical methods.
I have a suggestion, which others may have thought of: can you randomize the style using programs?<p>1. You write an essay.<p>2. [Word-level randomization] An algorithm looks at it, as a first cut, replaces uncommon words with common equivalents.<p>3. [Sentence-level randomization] Changes each sentence to a randomly selected grammatically correct equivalent.<p>4. [Paragraph-level randomization] Adds spurious repetitions, replace some paragraphs with automated summarized versions.<p>i.e. destroy the rhetorical effect, but get the message across so as to defeat NLP-based identification. If the content of the message is important, and the final version is okay to the author, this might be safer for anonymous disclosures.
This situation reminds me a little of the Unabomber, who was finally caught entirely because somebody (his brother) simply recognized his writing style.
I think if I planned to write a piece like this, I'd give an outline to a close confidant, such as my spouse, and have them write it. Assuming said confidant doesn't have a ton of public writing samples, this would frustrate any textual analysis.
I did a end-of-semester project on stylometry on my masters, and no-where in our literature review did we see anyone comparing TF-IDF's scores for authorship attribution.<p>Other than some guy on twitter saying TF-IDF, on what grounds is this classifier chosen?<p>Other mention that "This is about showing off the library". In that case, the authors should chose to show case their tools for a problem they are ment to solve.<p>"Check out my hammer, here's how to use it: First, you get some screws..."
The op-ed was a plant, personally approved by Trump. This has literally been his standard MO for decades: plant fake stories and weaponize the press for his own use. And it works. He effectively ended the entire Manafort/Cohen news cycle overnight.<p>Think about the plausibility of this story for even a moment. There's a list of less than 10 people that could have possibly written the op-ed, and what it amounts to legally is a <i>written admission of high treason</i>, a felony punishable by death. Do you seriously think <i>any</i> reasonable person would ever do that and assume they would get away with it?<p>The entire administration has been one massive psy-ops campaign against the American public. Don't listen to a single word they say. Just watch their actions, and react with your own judgement accordingly.
Unless "gutless anonymous" is a fool, they got someone else (e.g. NYT staffers) to write the actual text. Better to look at individuals who had recent physical contact with NYT senior level employees since they would certainly avoid telephone calls and email. Fairly likely the manifesto was written on a Remington or Underwood.