Why scraping and ecommerce are a perfect fit

119 pointsby sraduover 10 years ago

14 comments

firloopover 10 years ago

Interesting. So, it seems like you aren't respecting robots.txt. I picked Old Navy, as it was on your supported stores page [0], and went to their robots.txt [1]<pre><code> User-agent: * Disallow: /buy/ Disallow: /checkout/ </code></pre> So, do you have permission to violate robots.txt, as I'm sure there is some automated interaction with checkout/purchasing pages? Or I am I missing something about how TwoTap works? Scraping is one thing, but accessing when the management of the website prohibits it seems like a big no no.[0] : <a href="https://twotap.com/supported-stores/" rel="nofollow">https://twotap.com/supported-stores/</a>[1]: <a href="http://oldnavy.gap.com/robots.txt" rel="nofollow">http://oldnavy.gap.com/robots.txt</a>

评论 #8605273 未加载

tommccabeover 10 years ago

Looks like I'm at one of the retailers you crawl. Recently, our site was getting hit with a web crawler that was following links incorrectly. I black listed several IP addresses from accessing the site and now I wonder if it was this!Does your crawler obey robots.txt rules?

评论 #8599381 未加载

评论 #8599371 未加载

josephjrobisonover 10 years ago

I'm confused about the legality of scraping. Is it completely open, or are there some restrictions on scraping any site without explicit permission?

评论 #8600036 未加载

评论 #8599879 未加载

monksyover 10 years ago

I don't understand why you're pro-scrapping. ( I did write a blog post on this, and I believe that I posted it to HN before: <a href="http://theexceptioncatcher.com/blog/2012/07/how-to-get-rid-of-screen-scrapers-from-your-website/" rel="nofollow">http://theexceptioncatcher.com/blog/2012/07/how-to-get-rid-o...</a> )But, wouldn't it be more beneficial to get websites to open up an API to you, communicate to them to do so, or even offer consulting services to build an API?I know that there are a few cart/store offerings out there. it seems to me that they would have an API.Magneto: <a href="http://www.magentocommerce.com/api/soap/checkout/checkout.html" rel="nofollow">http://www.magentocommerce.com/api/soap/checkout/checkout.ht...</a>OpenCart Propretary API: <a href="http://opencart-api.com/" rel="nofollow">http://opencart-api.com/</a>Prestashop API: <a href="http://doc.prestashop.com/display/PS14/Using+the+REST+webservice" rel="nofollow">http://doc.prestashop.com/display/PS14/Using+the+REST+webser...</a>

评论 #8600113 未加载

grandalfover 10 years ago

The hard part is not scraping, it's returns. For many kinds of online products, the return rate is over 40%. The shopper must be completely aware of how to contact the merchant of record and how to return the product.Also, if you are scraping a large retailer you are effectively required to be PCI DSS level 1 compliant, which takes a bit of extra effort.

评论 #8598202 未加载

评论 #8598536 未加载

lloyddobblerover 10 years ago

I've worked with two shopping search engines, and interestingly, scraping sites was one of the things they did to build up their inventory as well. The big difference being, they simply organized the products into a searchable format, then sent traffic to the ecommerce site and let them handle the checkout . What you're doing is arguably more complex.(They also prioritized the feeds that were sent to them directly by retailers above the scraped items feeds - thus prioritizing paid listings, similar to the Google SERPs - so a different business model entirely.)That being said, a very cool concept - and agreed that, given the relatively-small number of ecommerce platforms out there, scraping then erving them up seems pretty scalable. Interested to see how it goes.

评论 #8599071 未加载

coupdejarnacover 10 years ago

I built a CJ scraper for a deals website that is now defunct. What a pain it was to maintain. All the different retailers dump their data into CJ in different ways. I might just put it on github if anyone's interested. Python + chromedriver + beautifulsoup + mechanize

评论 #8598551 未加载

blaze33over 10 years ago

I tried the demo with a Lego castle priced 99€ and got a grand total of more than $10k...FYI, Lego showed me the French version of their website as it's where I live. You seems to only offer shipping in the US though that's not clear reading your website. Still very interesting.Product URL: <a href="http://shop.lego.com/fr-FR/Le-ch%C3%A2teau-fort-70404?fromListing=listing" rel="nofollow">http://shop.lego.com/fr-FR/Le-ch%C3%A2teau-fort-70404?fromLi...</a>Screenshot: <a href="http://imgur.com/mlr8Q2e" rel="nofollow">http://imgur.com/mlr8Q2e</a>

评论 #8598484 未加载

评论 #8598472 未加载

dchukover 10 years ago

Can anyone go into a bit more detail about how the affiliate commissions work here? From what I have read, I would feed my affiliate link through TwoTap and you would then handle the cookie and conversion and everything?If I was using URLs gathered from a Commission Junction datafeed, is this basically a plug and play solution? Or do I need to process those URLs?Do you have a backend stats dashboard? Or would I still rely on CJ for that data?

评论 #8598526 未加载

quaffapintover 10 years ago

So you guys are scraping all the product information for a retailer and keeping it up to date? Or is it all live, you fetch it when that particular url is called? Where do you get the list of retailers to scrape?

评论 #8599394 未加载

Animatsover 10 years ago

This is sort of what Google Shopping was before it went all-ads.

评论 #8598521 未加载

dmritard96over 10 years ago

How many proxy nodes do you have?

评论 #8598558 未加载

michaelmcmillanover 10 years ago

This seems hard, but I think that's your big advantage (business-wise).

notastartupover 10 years ago

I don't get it. Is this just a middle man between all the retail websites and the publishers? Sort of like what Google is doing with the product search and also giving comissions on the items sold?

评论 #8599082 未加载