In the information retrieval space I am aware of software such as the following: lucene, solr, elasticsearch, sphinx, manticore, etc. As far as I am aware these are based on inverted indices with a bunch of stuff like custom stemmers for the particular language being used. Now this is rather annoying because to make improvements requires handcrafting various customizations for each language.<p>Is there a way to apply ML to such a problem? I am envisioning the following<p>* the smallish dataset to be queried on<p>* a much bigger corpus for teaching the system about the language so we don't need to handcraft customizations for the language such as a stemmer<p>* queries and labeled best results from the dataset to learn a ranking from<p>Is any of that possible with neural networks and if so where would one start to learn about what works? Ideally we want something where we can train on ever bigger data without needing to think about stuff like handcrafting feature vectors.