TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Hello Facebook Crawler

72 点作者 mwmnj超过 12 年前

8 条评论

whalesalad超过 12 年前
This reminds me of a recent experience I had with the Bing bot.<p>This most recent YC round, my co-founder and I used Skydrive to edit our application. Skydrive integrates pretty nicely with Word, even on a Mac, to allow for collaborative editing. It's like the best parts of Sharepoint, minus all the crap, and inside of a modern UI. I'm a diehard Apple user, but I also subscribe to the "right tool for the job" principle ... in this case it worked pretty well.<p>Anyway, inside the document were links to some private areas of our website that contained demo materials for YC. As requested, they were not password protected, but also not linked from anywhere else. While submitting I ensured that our nginx logs would capture visits to these URL's in a separate log, so we'd know when it was being looked at (sidenote, seeing visitors coming from inside justin.tv + the rincon hill towers is kind of exhilarating).<p>What surprised me was that almost immediately after we began working on the document, the Bing bot was going apeshit exploring the domain and the 'private' URL's. I had to quickly add a robots.txt to deny all on the root. I thought it was pretty interesting. At first I felt almost violated. But then it seems logical that they'd be indexing every URL in every document stored in their datacenter, why not?
评论 #4994783 未加载
评论 #4995137 未加载
评论 #4994834 未加载
cddotdotslash超过 12 年前
Why is this even news? Facebook has been crawling links for ages every time you post on the site. The crawler is how the link you paste gets a title, description, and sometimes a thumbnail.
评论 #4994761 未加载
评论 #4997920 未加载
评论 #4994748 未加载
maxjaderberg超过 12 年前
By looking at the headers you now have a great way of writing some analytics tools to see how much your website is shared on Facebook...
评论 #4994601 未加载
评论 #4995432 未加载
edouard1234567超过 12 年前
I'm surprised this post makes it to the homepage... They've been doing that for ever, no need to look at your logs to figure this out. How else would they find and display an image form the page you're providing a link to.
justinph超过 12 年前
12 lines of code instead of:<p><pre><code> tail -f /var/log/apache2/access.log</code></pre>
eli超过 12 年前
I would imagine they're checking the URL for malware as well.
评论 #4995514 未加载
spyder超过 12 年前
Also it would be smart to run malware check on these urls if they don't already doing it.
slajax超过 12 年前
I wish I had enough karma to down vote this.