I'm working on a side project that involves scraping pricing data from a site and making it much more accessible and usable, and ultimately providing a better answer to the type of question a user would normally have about the data in its original form (sorry, I'd rather not give specifics at this point). I am considering evolving it into a business, where users would pay for access. I was wondering, what are the legal ramifications of attempting such a business? Are there any real world examples of successful businesses that used screen scraping at the core of their service/offering, as well as hurdles or obstacles they faced? Finally, would this severely hamper my chances of securing funding?
I'm not a lawyer, but I did look at this issue about a year ago for a feature I was considering for our own startup. You can most certainly be sued, defending will be pricey, and you may lose the lawsuit - although case law in America isn't completely settled. The key case <i>against</i> is probably eBay vs. Bidder's Edge.<p>Having a single point of failure like this certainly won't help your fundraising efforts. If you're going to take the legal risk, why not bootstrap? That way you can put the money in your pocket as you go along and if the site does get shut down, at least you've extracted the profits along the way.
Databases are protected under copyright law.<p>Also, most databases that you can scrape will contain sentinels to tip off the database owner that you've scraped their content. The sentinels are typically bogus records that are hard to spot, including them in your output will do a good job of proving that you ripped the data.<p>If you're just going to regurgitate the data you would have to have a better excuse than 'making it more accessible and useable', you could offer your services to the current owner of the data. If they abandon the data that's a different story, but it sounds to me like they are not.<p>Another option is to license the data, simply contact them to ask if they have licensing options, this is the usual way to go about this.<p>Scraping is also a heavy drain on the resources of the company whose data you intend to harvest, this means that if they sue you successfully for breach of copyright that they have a fairly clear path to claiming damages.<p>good luck!<p>ps: if you have a corporate lawyer that would be a good spot to ask for advice.
Yodlee seem to do okay out of it. Their SDK offers a way to extract statement data from online banking sites which, I believe, is a glorified screen scraper.<p><a href="http://www.yodlee.com/" rel="nofollow">http://www.yodlee.com/</a>
I think it's a legal can of worms any way you look at it, but especially if you plan to charge for it. I think you would have to get permission from all of the sites you plan to scrape before you sell the data to someone. It might be ok if you gave it away for free and linked back to the original source, but even then it's sketchy.
I worked for a company that essentially was based on it, and know of another that via "scraping" and related automation makes its living as well. Both have been alive and well for years and have grown quite a bit. The trick is that you work with the companies/sites you are scraping from the beginning as a business relationship of some kind if possible, and not as a parasitic relationship. Basically if you are trying to help them sell their products/services/data and they benefit, even though you will likely have to fix the "scrapers" <i>all the time</i>, it works out.
Sorry to hijack your thread, but I've had some questions from cofounders about screen scraping.<p>Am I allowed to scrape information from their websites and use it to populate my system? Isn't that effectively what google does?<p>Is there a limit to what is considered acceptable/not acceptable? E.g., is it OK to scrape for email addresses that they publish on their site, but not for their part numbers?<p>Thanks!
This is essentially what Merkel, RapLeaf and Palantir do.<p>I used to make money by doing this and selling the results to financial research firms. But, I didn't make any of the details of what I was doing public.
Why don't you check out the tou of a scraping service like mozenda <a href="http://www.mozenda.com/policies" rel="nofollow">http://www.mozenda.com/policies</a><p>or fetch.com