You want to look for 'author attribution' as your keyword.<p>There are 2 main ways for assessing author attribution. One is through stylistic markers, where you look for a set of predefined features. The is average length per paragraph, or the number of times 'whenever' is used. This is highly language dependant.<p>The other way is through character n-gram analysis. You chose for which N you want to harvest N-grams and your author profile is the frequency of top 2000 n-grams and you compare this profile with a documents top 2000 n-grams and the profile with the shortest distance is your match.<p>Robert Layton has a tutorial and some code on N-gram attribution on Github:<p>* <a href="https://github.com/robertlayton/authorship_tutorials" rel="nofollow">https://github.com/robertlayton/authorship_tutorials</a><p>* <a href="https://github.com/robertlayton/author-detection" rel="nofollow">https://github.com/robertlayton/author-detection</a><p>And here's a list of papers I've reviewed while doing a similar project.<p>[1] Shlomo Argamon, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. Gender, genre, and<p>writing style in formal written texts.<p>23(3):321–346, 2003.<p>[2] John F Burrows. ‘an ocean where each kind...’: Statistical analysis and some major determinants<p>of literary style. Computers and the Humanities, 23(4-5):309–321, 1989.<p>[3] Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, and Sokratis Katsikas. Source<p>code author identification based on n-gram author profiles. In Artificial Intelligence Applica-
tions and Innovations, pages 508–515. Springer, 2006.<p>[4] Sheena Gardner and Hilary Nesi. A classification of genre families in university student writing.<p>Applied linguistics, 34(1):25–52, 2013.<p>[6] John Houvardas and Efstathios Stamatatos. N-gram feature selection for authorship identifica-
tion. In Artificial Intelligence: Methodology, Systems, and Applications, pages 77–86. Springer,<p>2006.<p>[7] Patrick Juola. Authorship attribution. Foundations and Trends in information Retrieval,<p>1(3):233–334, 2006.<p>[8] Vlado Kešelj, Fuchun Peng, Nick Cercone, and Calvin Thomas. N-gram-based author profiles<p>for authorship attribution. In Proceedings of the conference pacific association for computational<p>linguistics, PACLING, volume 3, pages 255–264, 2003.<p>[9] Maarten Lambers and Cor J Veenman. Forensic authorship attribution using compression dis-
tances to prototypes. In Computational Forensics, pages 13–24. Springer, 2009.<p>[11] Fiona J Tweedie and R Harald Baayen. How variable may a constant be? measures of lexical<p>richness in perspective. Computers and the Humanities, 32(5):323–352, 1998.<p>[12] Cor J Veenman and Zhenshi Li. Authorship verification with compression features.<p>[13] Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. A framework for authorship identifi-<p>cation of online messages: Writing-style features and classification techniques. Journal of the<p>American Society for Information Science and Technology, 57(3):378–393, 2006.