TechEcho

8 comments

whalesaladover 12 years ago

This reminds me of a recent experience I had with the Bing bot.This most recent YC round, my co-founder and I used Skydrive to edit our application. Skydrive integrates pretty nicely with Word, even on a Mac, to allow for collaborative editing. It's like the best parts of Sharepoint, minus all the crap, and inside of a modern UI. I'm a diehard Apple user, but I also subscribe to the "right tool for the job" principle ... in this case it worked pretty well.Anyway, inside the document were links to some private areas of our website that contained demo materials for YC. As requested, they were not password protected, but also not linked from anywhere else. While submitting I ensured that our nginx logs would capture visits to these URL's in a separate log, so we'd know when it was being looked at (sidenote, seeing visitors coming from inside justin.tv + the rincon hill towers is kind of exhilarating).What surprised me was that almost immediately after we began working on the document, the Bing bot was going apeshit exploring the domain and the 'private' URL's. I had to quickly add a robots.txt to deny all on the root. I thought it was pretty interesting. At first I felt almost violated. But then it seems logical that they'd be indexing every URL in every document stored in their datacenter, why not?

评论 #4994783 未加载

评论 #4995137 未加载

评论 #4994834 未加载

cddotdotslashover 12 years ago

Why is this even news? Facebook has been crawling links for ages every time you post on the site. The crawler is how the link you paste gets a title, description, and sometimes a thumbnail.

评论 #4994761 未加载

评论 #4997920 未加载

评论 #4994748 未加载

maxjaderbergover 12 years ago

By looking at the headers you now have a great way of writing some analytics tools to see how much your website is shared on Facebook...

评论 #4994601 未加载

评论 #4995432 未加载

edouard1234567over 12 years ago

I'm surprised this post makes it to the homepage... They've been doing that for ever, no need to look at your logs to figure this out. How else would they find and display an image form the page you're providing a link to.

justinphover 12 years ago

12 lines of code instead of:<pre><code> tail -f /var/log/apache2/access.log</code></pre>

eliover 12 years ago

I would imagine they're checking the URL for malware as well.

评论 #4995514 未加载

spyderover 12 years ago

Also it would be smart to run malware check on these urls if they don't already doing it.

slajaxover 12 years ago

I wish I had enough karma to down vote this.

8 comments

whalesaladover 12 years ago

评论 #4994783 未加载

评论 #4995137 未加载

评论 #4994834 未加载

cddotdotslashover 12 years ago

Why is this even news? Facebook has been crawling links for ages every time you post on the site. The crawler is how the link you paste gets a title, description, and sometimes a thumbnail.

评论 #4994761 未加载

评论 #4997920 未加载

评论 #4994748 未加载

maxjaderbergover 12 years ago

By looking at the headers you now have a great way of writing some analytics tools to see how much your website is shared on Facebook...

评论 #4994601 未加载

评论 #4995432 未加载

edouard1234567over 12 years ago

justinphover 12 years ago

12 lines of code instead of:<pre><code> tail -f /var/log/apache2/access.log</code></pre>

eliover 12 years ago

I would imagine they're checking the URL for malware as well.

评论 #4995514 未加载

spyderover 12 years ago

Also it would be smart to run malware check on these urls if they don't already doing it.

slajaxover 12 years ago

I wish I had enough karma to down vote this.

Hello Facebook Crawler

8 comments

Hello Facebook Crawler

8 comments