科技回声

8 条评论

whalesalad超过 12 年前

This reminds me of a recent experience I had with the Bing bot.This most recent YC round, my co-founder and I used Skydrive to edit our application. Skydrive integrates pretty nicely with Word, even on a Mac, to allow for collaborative editing. It's like the best parts of Sharepoint, minus all the crap, and inside of a modern UI. I'm a diehard Apple user, but I also subscribe to the "right tool for the job" principle ... in this case it worked pretty well.Anyway, inside the document were links to some private areas of our website that contained demo materials for YC. As requested, they were not password protected, but also not linked from anywhere else. While submitting I ensured that our nginx logs would capture visits to these URL's in a separate log, so we'd know when it was being looked at (sidenote, seeing visitors coming from inside justin.tv + the rincon hill towers is kind of exhilarating).What surprised me was that almost immediately after we began working on the document, the Bing bot was going apeshit exploring the domain and the 'private' URL's. I had to quickly add a robots.txt to deny all on the root. I thought it was pretty interesting. At first I felt almost violated. But then it seems logical that they'd be indexing every URL in every document stored in their datacenter, why not?

评论 #4994783 未加载

评论 #4995137 未加载

评论 #4994834 未加载

cddotdotslash超过 12 年前

Why is this even news? Facebook has been crawling links for ages every time you post on the site. The crawler is how the link you paste gets a title, description, and sometimes a thumbnail.

评论 #4994761 未加载

评论 #4997920 未加载

评论 #4994748 未加载

maxjaderberg超过 12 年前

By looking at the headers you now have a great way of writing some analytics tools to see how much your website is shared on Facebook...

评论 #4994601 未加载

评论 #4995432 未加载

edouard1234567超过 12 年前

I'm surprised this post makes it to the homepage... They've been doing that for ever, no need to look at your logs to figure this out. How else would they find and display an image form the page you're providing a link to.

justinph超过 12 年前

12 lines of code instead of:<pre><code> tail -f /var/log/apache2/access.log</code></pre>

eli超过 12 年前

I would imagine they're checking the URL for malware as well.

评论 #4995514 未加载

spyder超过 12 年前

Also it would be smart to run malware check on these urls if they don't already doing it.

slajax超过 12 年前

I wish I had enough karma to down vote this.

8 条评论

whalesalad超过 12 年前

评论 #4994783 未加载

评论 #4995137 未加载

评论 #4994834 未加载

cddotdotslash超过 12 年前

Why is this even news? Facebook has been crawling links for ages every time you post on the site. The crawler is how the link you paste gets a title, description, and sometimes a thumbnail.

评论 #4994761 未加载

评论 #4997920 未加载

评论 #4994748 未加载

maxjaderberg超过 12 年前

By looking at the headers you now have a great way of writing some analytics tools to see how much your website is shared on Facebook...

评论 #4994601 未加载

评论 #4995432 未加载

edouard1234567超过 12 年前

justinph超过 12 年前

12 lines of code instead of:<pre><code> tail -f /var/log/apache2/access.log</code></pre>

eli超过 12 年前

I would imagine they're checking the URL for malware as well.

评论 #4995514 未加载

spyder超过 12 年前

Also it would be smart to run malware check on these urls if they don't already doing it.

slajax超过 12 年前

I wish I had enough karma to down vote this.

Hello Facebook Crawler

8 条评论

Hello Facebook Crawler

8 条评论