The blog post I wrote wasn't primarily about the legality of scraping (and I also didn't expect it to be read by more than a few people). But as that seems to be the topic of the thread, here's my response.<p>The courts found that it isn't possible to copyright facts, and that's all we were scraping - things like addresses, business name, and phone number. We weren't even scraping things like business category, because something as simple as putting a restaurant in the category "Fine Dining" might be considered a judgment call and therefore value-add by the original site.<p>And think of what would have happened if the court had found otherwise (i.e. had found that lists of facts could be copyrighted). If you opened a store, and I was the first one to put your address and phone number on-line, no one else could ever include your address or phone number on their site. Even if you created a website for your own business after I published your address, you wouldn't be able to include it on your site, because you'd violate <i>my</i> copyright.<p>I can't see how the Supreme Court could have ruled any other way.
There will always be garbage in. you're algorithms have to overcome this for the most part. Some things have to be manually dealt with and some things could be manually dealt with, but it's impossible to manually verify tens of millions of local listings.
It looks like it still needs a lot of work. As a quick test I looked for Sports Bars in London (via their categories) and it returned an Antique Shop in Westerham. I then tried editing the record to remove irrelevant categories and got a server error.