I was working similar idea, but for laws. They also change quite a lot and not very well presented online.<p>I halted the development after I realized how complicated can word diffs get. I would be interested about techniques that you used. I it is quite good as it is, but I noticed some common problems, such as:<p>1. Reusing letters from words that have nothing in common:<p>> fury over a [-hik-]{+n increas+}e in bus fares [1]<p>2. Inserting few paragraphs into one word (first paragraph) [2]<p>3. Loads of minor changes, also more of 1. [3]<p>[1] <a href="http://newsdiffs.org/diff/263401/263432/www.nytimes.com/2013/06/20/world/americas/brazil-protests.html" rel="nofollow">http://newsdiffs.org/diff/263401/263432/www.nytimes.com/2013...</a><p>[2] <a href="http://newsdiffs.org/diff/265812/265841/www.washingtonpost.com/world/asia_pacific/taliban-want-sign-on-their-qatar-office-resurrected-threaten-to-scuttle-talks/2013/06/22/ca979502-db0a-11e2-b418-9dfa095e125d_story.html" rel="nofollow">http://newsdiffs.org/diff/265812/265841/www.washingtonpost.c...</a><p>[3] <a href="http://newsdiffs.org/diff/265776/265810/www.nytimes.com/2013/06/23/world/asia/flooding-kills-hundreds-in-northern-india.html" rel="nofollow">http://newsdiffs.org/diff/265776/265810/www.nytimes.com/2013...</a>
This is amazing. This sort of technology may not be sexy enough for TechCrunch, but it's going to be infinitely more valuable to posterity than photo filters. Historiography will continue to evolve at lightspeed for the next several decades, and I'm excited to see how this sort of accumulated data interacts with coming advances in machine learning.
I love this idea but I think it would be made a lot better by having this information available on the news websites themselves with a browser plugin or something.