For the lazy, the doi is 10.1038/s41586-019-1335-8 if you want to add it to your bibliographies. Obviously, don't use the doi for any illegal purposes, such as getting around the paywall.
An article about the paper by the first author <a href="https://towardsdatascience.com/using-unsupervised-machine-learning-to-uncover-hidden-scientific-knowledge-6a3689e1c78d" rel="nofollow">https://towardsdatascience.com/using-unsupervised-machine-le...</a>
Summary: Given abstracts of materials science papers they were able to predict that certain materials would have desirable/interesting properties before these materials were actually examined for those properties. This was confirmed by "holding out" recent years of data and then seeing if predictions from say 2009 would have held up today. They also have made predictions which have yet to be confirmed / refuted.<p>Interesting points on future work:<p>- This was only using abstracts. Using full papers could yield significant improvements.<p>- Uses word2vec and not Bert / Elmo, so there's likely to be another jump in performance there.
You can get an idea of the content from their GitHub <a href="https://github.com/materialsintelligence/mat2vec/blob/master/README.md" rel="nofollow">https://github.com/materialsintelligence/mat2vec/blob/master...</a><p>The author emails are at the end of README.md if you still want to ask for a preprint.
If you do not have access to the Nature paper, this paper reports on the same study. <a href="https://chemrxiv.org/articles/Named_Entity_Recognition_and_Normalization_Applied_to_Large-Scale_Information_Extraction_from_the_Materials_Science_Literature/8226068/1" rel="nofollow">https://chemrxiv.org/articles/Named_Entity_Recognition_and_N...</a>
5-years old discovery, nothing spectacular (as of 2019). On the other hand, a good example of publishing: code, corpora and materials are available for everyone to reproduce it.
We published something similar in spirit recently (although it ended up as a conference paper and not in Nature)... Notably, we did our study with much fewer data - instead of millions of patents we had the text of a few thousand patents and the text of a few hundred conference papers. We had a specific focus and we wanted to focus on texts about energetic materials (explosives and propellants).<p>We showed how chemical-application & chemical-property relations are captured by word2vec and GloVe. For instance we found rocket fuels where the chemicals appearing closest to “rocket” while materials used in air bags appeared closest to “air bag”. We were able to filter to chemical names using ChemDataExtractor and further to likely energetic chemicals by obtaining SMILES strings from PubChem and using a classifier to classify them as likely energetics or not.<p>You can find our work here : <a href="https://arxiv.org/pdf/1903.00415.pdf" rel="nofollow">https://arxiv.org/pdf/1903.00415.pdf</a> .
Is the novel part the application to materials science? I can't get to the nature paper on mobile but the analysis in the other resources linked here looks pretty thorough.<p>Is there anything new methodology wise in the nature version?
Hi All, glad to see our paper caught your attention. Here is a link to read the paper: <a href="https://rdcu.be/bItqk" rel="nofollow">https://rdcu.be/bItqk</a>
Between this and the UMAP paper on cancer publishing in Nature, I'm convinced that my next publication will be in the sample place that Isaac Newton published in