We've been testing an app that determines news the second it appears online.<p>Since our new version is up, one major US news event has happened which is the hostage situation at the Discovery Channel.<p>We broke this story at 1:07PM EST on September 1st, 2010. We need to determine when other sources also published this event.<p>Any idea? How do I go about this?
Here's an idea for next time. Assuming you don't need the data in real time, but instead just need to be able to say, when X story broke, we broke it first 80% of the time.<p>This is probably a stupid idea but I've only given it a few minutes of thought, so thats all you get tonight :)<p>Take automatic snapshots of news pages, like Drudge.com, cnn.com or whatever you want to compare to. Maybe once a minute.<p>Now hook it up with Mechanical turk and pose the question (in turker language its HIT (human intelligence task), "does X story appear on this page at 10:00pm?". They answer as true or false, then keep moving the capture forward and keep asking the question.