TechEcho

6 comments

rmanochaabout 15 years ago

I think that if you're generally respectful of the target websites - scraping them is ok. For example, I scrape various government websites for my website. I use a random delay between requests and am generally very careful about not requesting the same page multiple times (this is hard 'cause a lot of the pagination happening on these pages is via JS calls).I am ok if someone decides to scrape my websites in a similar fashion - although if I do see that starting to happen, I'd rather just go ahead and build an API.

DanielBMarkhamabout 15 years ago

On the consumer side, I'm happy with the following rules.I'm happy writing a program to let individual users scrape from their computers. After all, they have a right to visit the site and retrieve their data in whatever format suits them.I'm not so keen on setting up a server to scrape data, or having a server scrape a huge pile of data for a list of users. After all, whoever is running the service is keeping stuff for all of the users. My taking it all is just stealing.On the provider side, I think my feelings are about the same. I think you have to be careful that you leverage scraping -- let scrapers come in and get enough stuff that it makes people want to visit, but not so much that they have everything. If executed effectively, you can use scraping to great benefit.

_deliriumabout 15 years ago

I guess it depends on what they're doing with it. I'm not particularly against scraping per se, but I would look askance at some of the more sleazy uses, like just republishing (slightly modified versions of) blog posts on some AdSense-laden blog as if it were their own post. The key issues to me are: 1) transformativity, i.e. it produces something genuinely new and different from the content it scraped; and 2) proper credit to the source of the original content.

nostrademonsabout 15 years ago

I'm happy to page scrape other sites and not happy to have mine scraped. ;-)More seriously, if there're bots that you don't want scraping, just robots.txt them away. If they ignore that, then they're being rather rude and you can figure out some way to auto-block them.

benologistabout 15 years ago

Depends what it's being scraped for. MFA spam blogs stealing content, or some valid use that could further your own interests.

evancuretonabout 15 years ago

hey can put you on my page

6 comments

rmanochaabout 15 years ago

DanielBMarkhamabout 15 years ago

_deliriumabout 15 years ago

nostrademonsabout 15 years ago

benologistabout 15 years ago

Depends what it's being scraped for. MFA spam blogs stealing content, or some valid use that could further your own interests.

evancuretonabout 15 years ago

hey can put you on my page

Ask HN: What's your philosophy on page scraping?

6 comments

Ask HN: What's your philosophy on page scraping?

6 comments