TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How Forensic Linguistics Identified J.K. Rowling

135 点作者 ahmadss将近 12 年前

16 条评论

junto将近 12 年前
Correction: Rowling was &#x27;outed&#x27; by her lawyer&#x27;s wife&#x27;s friend.<p><a href="http://www.independent.co.uk/arts-entertainment/books/news/jk-rowling-angry-and-disappointed-after-law-firm-leaked-robert-galbraith-pseudonym-8718087.html" rel="nofollow">http:&#x2F;&#x2F;www.independent.co.uk&#x2F;arts-entertainment&#x2F;books&#x2F;news&#x2F;j...</a><p>If I was the law firm, I&#x27;d fire the lawyer.
评论 #6075520 未加载
评论 #6075252 未加载
评论 #6075298 未加载
评论 #6075487 未加载
评论 #6075667 未加载
评论 #6078615 未加载
评论 #6078429 未加载
GigabyteCoin将近 12 年前
&quot;I called both of them <i></i>yesterday<i></i> and learned not only how the Rowling investigation worked, but about the fascinating world of forensic linguistics.&quot;<p><i>Cringe.</i><p>From my experience (gleaned from dutifully reading every Bitcoin-related article I can get my hands on) I am very wary of reading about any topic which the author admits to just having learnt about <i>yesterday</i>.<p>The majority of the time, unfortunately, English majors aren&#x27;t the best at understanding technology.
评论 #6077505 未加载
评论 #6076708 未加载
elchief将近 12 年前
PCA is a pretty neat technique. It&#x27;s quite old too, invented by Pearson in the early 1900&#x27;s.<p>Basically, you find a &quot;vector&quot; that travels along the part of the data with the highest variance. Then you find an orthogonal vector that travels along the part with the next highest variance.<p>You then have a set of vectors that explain all of the variance, that aren&#x27;t correlated (because they&#x27;re orthogonal), and are ranked by how much they explain.<p>This can be useful in regression to get rid of correlated variables, or you can get rid of some of the low variance components if there are more columns than rows, which breaks OLS regression.<p>Consider a new town that you want to get to know as quickly as possible. What is the best method? You start with the longest street, then take a left and travel the next longest street, and so on. You can get a pretty good idea about the town without seeing it all.
3minus1将近 12 年前
The analysis of word length is interesting. English has a lot of long, multi-syllabic Latin based words, and also a lot of short Germanic based words. I wonder the extent to which a higher percentage of long words indicates a preference for the Latin and vice versa.
评论 #6078091 未加载
praptak将近 12 年前
Automatic transformation of text to evade these methods seems feasible (google translate back and forth might be the crude first attempt.) Obviously there might exist more refined methods of identification. In case of a book it is probably hard not to ruin it this way but reviews, posts and such do not require such high standards.
评论 #6075485 未加载
评论 #6075319 未加载
评论 #6075719 未加载
评论 #6078151 未加载
评论 #6075878 未加载
评论 #6075445 未加载
gtani将近 12 年前
<a href="https://news.ycombinator.com/item?id=3613734" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=3613734</a><p>this is a tough thing to google for. Terms I used a few weeks ago<p>- stylometry<p>- authorship attribution&#x2F;verification<p>- grammatical analysis, plagiarism detection
hnha将近 12 年前
Way too much terms like &quot;proof&quot;, &quot;fact&quot;, &quot;confirmation&quot;, &quot;definitely&quot; later on. Isn&#x27;t something like this <i>always</i> with a lot of assumption and <i>always</i> with a bias from the samples? Everyone could happen to be writing like someone else. There is nothing that definitively makes writing different between people like a fingerprint (which, as I understand it, is biologically highly random).<p>Analysing sites like HN to see indicators(!) for sockpuppets or generally correlation of likelihood between accounts&#x27; writing styles would rock!
cliveowen将近 12 年前
Is that really a website that uses a normally sized font and doesn&#x27;t drown me with ads?<p>Nah, I must be dreaming.
fortepianissimo将近 12 年前
All of the statistical analyses sound to be fairly easy to beat.<p>Say you want to pretend to be another author: first build a language model of the target author, then use the model to single out sentences of high perplexity from your writing. Then, have the model &quot;rewrite&quot; your sentences by replacing your words with synonyms of higher n-gram probabilities according to the model. Similar things can be done to fool the character n-gram analyses, or analyses above words (e.g., parses).
评论 #6078964 未加载
评论 #6077594 未加载
waterlesscloud将近 12 年前
Pretty cool. Interesting too, since Rowling is probably the most imitated author in the world at the moment. I guess not by published authors, though.
georgemcbay将近 12 年前
Have there been any instances where &quot;Forensic Linguistics&quot; actually predicted an outcome that wasn&#x27;t previously suspected and it turned out to be true? All of the examples I&#x27;ve heard of are it &quot;confirming&quot; things already suspected by other means.<p>Either way it is still an interesting tool and a cool use of technology, but I&#x27;d be a lot more impressed if the software were fed the text to a large number of random books and it detected an instance (with very high likelihood) of some famous author writing under a pen name, and then had that confirmed.
Nycto将近 12 年前
Something similar could probably be done with code (if it hasn&#x27;t been done already). I suppose auto-formatting and checkstyles might mute some things, but I imagine you could still get a read from things like variable names, class names, function length, etc.
评论 #6075893 未加载
MarkMc将近 12 年前
I&#x27;m curious about the ethics of this. Why is it OK to &#x27;out&#x27; someone as the author of a book, but it&#x27;s not OK to &#x27;out&#x27; someone as gay?
评论 #6075740 未加载
评论 #6075783 未加载
评论 #6075757 未加载
brownbat将近 12 年前
s&#x2F;b &quot;How Forensic Linguistics Confirmed a Leak about Rowling&quot;
mnglkhn2将近 12 年前
At the same time we can think of the whole thing as a smart marketing plot.
alxbrun将近 12 年前
I don&#x27;t buy this &#x27;outed&#x27; story one second.<p>This is either marketing or fear of public reception of her non-Potter book (imagine the pressure she must have). Either way, this is crap.
评论 #6075627 未加载
评论 #6077616 未加载