Peter Murray Rust (author of this blog post) is a really great man. He's been a tireless advocate for dismantling privelege and setting knowledge free for several decades. I'm proud to say he's becoming a sort of mentor to me. Last week I spent a couple of days with his research group and saw this software in action - it's really impressive.<p>They can take an ancient paper with very low quality diagrams of complex chemical structures, parse the image into an open markup language and reconstruct the chemical formula and the correct image. Chemical symbols are just one of many plugins for their core software which interprets unstructured, information rich data like raster diagrams. They also have plugins for phylogenetic trees, plots, species names, gene names and reagents. You can develop plugins easily for whatever you want, and they're recruiting open source contributors (see <a href="https://solvers.io/projects/QADhJNcCkcKXfiCQ6" rel="nofollow">https://solvers.io/projects/QADhJNcCkcKXfiCQ6</a>, <a href="https://solvers.io/projects/4K3cvLEoHQqhhzBan" rel="nofollow">https://solvers.io/projects/4K3cvLEoHQqhhzBan</a>).<p>As a side effect of how their software works, it can detect tiny suggestive imperfections in images that reveal scientific fraud. I was shown a demo where a trace from a mass spec (like this <a href="http://en.wikipedia.org/wiki/File:ObwiedniaPeptydu.gif" rel="nofollow">http://en.wikipedia.org/wiki/File:ObwiedniaPeptydu.gif</a>) was analysed. As well as reading the data from the plot, it revealed a peak that had been covered up with a square - the author had deliberately obscured a peak in their data that was inconvenient. Scientific fraud. It's terrifying that they find this in <i>most</i> chemistry papers they analyse.<p>Peter's group can analyse thousands or hundreds of thousands of papers an hour, automatically detecting errors and fraud and simultaneously making the data, which are <i>facts</i> and therefore not copyrightable, free. This is one of the best things that has happened to science in many years, except that publishers deliberately prevent it. Their work also made me realise it would be possible to continue Aaron Swartz' work on a much bigger scale (<a href="http://blahah.net/2014/02/11/knowledge-sets-us-free/" rel="nofollow">http://blahah.net/2014/02/11/knowledge-sets-us-free/</a>).<p>Academic publishers who are suppressing this are literally the enemies of humanity.