Can someone who has more of a network theory background say why this would be interesting?<p>From an NLP angle, both what they're doing (text classification) and how they're doing it (constructing a co-occurrence matrix) don't sound particularly novel nor do the network-theoretic properties they get from the unweighted, undirected form of the co-occurrence matrix seem to give any valuable insights.<p>As a comparison, see the
2009 workshop on text graphs
<a href="http://www.textgraphs.org/ws10/index.html" rel="nofollow">http://www.textgraphs.org/ws10/index.html</a>
or papers such as Gaume et al (2007) Semantic associations and confluences in paradigmatic networks
<a href="http://w3.erss.univ-tlse2.fr/textes/pagespersos/gaume/resources/Gaume_Duvignau_Vanhove_final.pdf" rel="nofollow">http://w3.erss.univ-tlse2.fr/textes/pagespersos/gaume/resour...</a><p>Did I mention that the physics people totally ignore all the (interesting and non-trivial) existing literature on the topic? It's a bit as if a CS/NLP person would write a paper on an information-theoretic approach to physics while totally ignoring the physics bits in it.
I just gave the PDF a quick read, and it looked useful enough to put in my NLP permanent reading collection. There seems to be growing momentum in both NLP research and the number of good papers that are freely available. Last month, someone posted a link to "ICWSM – A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews" on HN - another interesting and potentially useful paper.
Interesting that they used the research equivalent of a Minimum Viable Product. From the paper:<p><i>Of course, more complicated semantic network models
are certainly possible. For instance, one could construct
a weighted network. However, we sought the simplest
possible model which could distinguish between fictional
and non-fictional written storytelling.</i>