<i>We'll start to see the re-emergence of tools from old-school NLP, but now augmented with the powerful statistical tools and data-oriented automation of new-school NLP. IBM's Watson already does this to some extent.</i><p>This is not a new trend. As early as 1997, Steven Abney augmented [1] attribute-value grammars with discriminative modelling (maximum entropy models) in this case to form 'stochastic attribute-value grammars'. There is a lot of work on efficiently extracting the best parse from packed forests, etc. Most systems that rely on unification grammars (e.g. HPSG grammars) already use stochastic models.<p>In the early to mid 2000s when the modelling of association strengths using structured or unstructured text became popular, old-school parsers have been adopting such techniques to learn selectional preferences that cannot be learnt from the usually small hand-annotated treebanks. E.g. in languages that normally have SVO (subject-verb-object) for main clauses but also permit OVS order, parsers trained on small hand-annotated treebanks would often be set on the wrong path when the direct object is fronted (analyzing the direct object as subject). Techniques from association strength modelling were used to learn selectional preferences such as 'bread is usually the subject of eat' from automatically annotated text [2].<p>In recent years, learning word vector representations using neural networks has become popular. Again, not surprisingly, people have been integrating vectors as features in the disambiguation components of old-school NLP parsers. In some cases with great success.<p>tl;dr, the flow of ideas and tools from new-school NLP to old-school NLP has been going on ever since the statistical NLP revolution started.<p>[1] <a href="http://ucrel.lancs.ac.uk/acl/J/J97/J97-4005.pdf" rel="nofollow">http://ucrel.lancs.ac.uk/acl/J/J97/J97-4005.pdf</a><p>[2] <a href="http://www.let.rug.nl/vannoord/papers/iwptbook.pdf" rel="nofollow">http://www.let.rug.nl/vannoord/papers/iwptbook.pdf</a>