Awesome! I was actually thinking of doing something similar, but with some key differences:<p>- try and find transcripts and speeches each "suspect" has given.<p>- Probably use a classification algo to try and determine who the author could be (most likely using KNN)<p>- I hadn't determined feature building, but was thinking of a either a simple one-hot encoding, entity embedding, or tf-idf.<p>I then encountered a moral dilemma -- do I do this and it potentially becomes ammo for messing with someone else's career (when that is not my intention)?
Interesting. I put more weight on the authors caveats about this method, than on the method itself, which is fine since it was I think just presented as an example of what could be done.<p>Also, if the person most likely (by this analysis) is actually the one who wrote the op-ed, it was either:
- the POTUS, which seems unlikely (although who knows)
- the secretary for the POTUS, who types his tweets as well
- the one person in the President's circle who he cannot fire
I wonder if there could be a way to prevent this by e.g. paying a bunch of random people to rewrite parts of the text in their “own” style (with a review to ensure the meaning isn’t lost).
This is definitely interesting. However, I have assumed from the beginning that, protests to the contrary notwithstanding, this op-ed was thoroughly edited by <i>someone</i> to prevent such an analytical outing.