TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Garbage In, Garbage Out: Why Scraping Doesn't Work for Local Search

19 pointsby mcxxalmost 16 years ago

4 comments

jack_deneutalmost 16 years ago
The blog post I wrote wasn't primarily about the legality of scraping (and I also didn't expect it to be read by more than a few people). But as that seems to be the topic of the thread, here's my response.<p>The courts found that it isn't possible to copyright facts, and that's all we were scraping - things like addresses, business name, and phone number. We weren't even scraping things like business category, because something as simple as putting a restaurant in the category "Fine Dining" might be considered a judgment call and therefore value-add by the original site.<p>And think of what would have happened if the court had found otherwise (i.e. had found that lists of facts could be copyrighted). If you opened a store, and I was the first one to put your address and phone number on-line, no one else could ever include your address or phone number on their site. Even if you created a website for your own business after I published your address, you wouldn't be able to include it on your site, because you'd violate <i>my</i> copyright.<p>I can't see how the Supreme Court could have ruled any other way.
mshafriralmost 16 years ago
<p><pre><code> "We've tried scraping ourselves in the past (yes, it's perfectly legal)," </code></pre> Is scraping indeed "perfectly legal"?
评论 #703700 未加载
评论 #703661 未加载
评论 #703645 未加载
评论 #703686 未加载
评论 #703745 未加载
jshenalmost 16 years ago
There will always be garbage in. you're algorithms have to overcome this for the most part. Some things have to be manually dealt with and some things could be manually dealt with, but it's impossible to manually verify tens of millions of local listings.
评论 #704166 未加载
mbarralmost 16 years ago
It looks like it still needs a lot of work. As a quick test I looked for Sports Bars in London (via their categories) and it returned an Antique Shop in Westerham. I then tried editing the record to remove irrelevant categories and got a server error.
评论 #704148 未加载