I think I know what the problem is; we're detecting HN as a dead page. It's unclear whether this happened on the HN side or on Google's side, but I'm pinging the right people to ask whether we can get this fixed pretty quickly.<p>Added: Looks like HN has been blocking Googlebot, so our automated systems started to think that HN was dead. I dropped an email to PG to ask what he'd like us to do.
Does anyone else find it disturbing that google employees are bending over for pg/hn? Seriously, if any other webmaster blocked googles bots they wouldn't change their algorithms to accommodate, or see how they could use less of our resources.<p>Its pg's fault not googles, and I dont see why they should care. Maybe from their standpoint it would be more beneficial to google users who are used to typing in 'hacker news' to visit this site, but since when did that matter to google?<p>Also don't get me wrong I love both google and hackernews. I just find whats going on in this thread interesting..
Hi<p>I work at Google helping webmasters.<p>It seems something has been blocking Googlebot from crawling HN, and so our algorithms think the site is dead. A very common cause is a firewall.<p>I realize that pg has been cracking down on crawlers recently. Maybe there was an unexpected configuration change? If Googlebot is crawling too fast, you can slow it down in Webmaster Tools.<p>I'm happy to answer any questions. This is a common issue.<p>Pierre
PG: welcome to the woes of being an alexa top 1000 site with over 1 million pages of dynamic content.<p>HN has roughly 1.3 million pages indexed by google.<p>1.3M pages at 43k per page is 53 gigs to cache static versions of all pages on the site. Quadruple that for a worst case scenario and it'll still easily fit on a single drive.<p>When your site gets this popular you tend to have to re-architect your application to solve perf issues. You could serve googlebot UA's 1 week old cached pages for example.<p>I'd encourage you to start thinking of yourself as a utility providing a valuable and necessary resource to the Net and take the time and energy to solve this properly.
Has Google been making significant changes to the search ranking algorithms in the past couple months? I've noticed a significant decline in the quality of results, to the point that (for the first time in years), I've bounced over to DuckDuckGo or Bing to try my luck there. I love Google as a company, so (if this isn't just in my head) I'd love to see things get better again.<p>EDIT: Looks like the change to HN's ranking is related to a change that pg made, so my comment is now less relevant to the parent post. I still stand by it, though. :-)
Out of curiosity, how did the OP even find out that this was going on? He appears to be a long-time user of the site, and therefore would have no reason to Google "Hacker News".<p>The most common other reason would be that some people use Google as their URL bar - instead of typing "hackerne.ws" or "news.ycombinator.com" into the URL bar, they type "Hacker News" into Google and click on the first result. However, I would've thought that the types of people using HN would have the tech savvy to use a keyword bookmark, or at least the URL.
A few months ago, I was talking about the Stanford ai/ml-classes to one of my friends. He asked me if there were any more classes and I said "Yes, take a look at the front page in Hacker News, there are some links. I visit that site frequently, it's very helpful."<p>There was one little issue though.<p>The poor guy didn't know what hackernews was, so found that site (hackernews.com). He then scanned the Twitter stream over that site several times to find those links and started visiting the site for several days to find those other helpful links.<p>When I saw him again a few days later, he told me: "What a silly site HackerNews is! And I couldn't find the links to those classes over there."<p>He also told me that he was disappointed of me for visiting such a silly site.<p>Now, can you guess the look on his face when I told him that he was visiting the wrong site for the last few days?
Matt Cutts is my hero. I've never seen anyone from google (or any company) be as proactive in explaining, interacting, and helping this community. Thanks.
Raised here already:<p><a href="http://news.ycombinator.com/item?id=3277365" rel="nofollow">http://news.ycombinator.com/item?id=3277365</a><p>FWIW I've raised this issue.
Don't link to Google search results!<p>It's personalized - everyone sees different results. Even if you don't have a Google account.<p>For me <a href="http://news.ycombinator.com" rel="nofollow">http://news.ycombinator.com</a> is the top page. But when I use TOR, <a href="http://www.hackernews.com" rel="nofollow">http://www.hackernews.com</a> and <a href="http://thehackernews.com/" rel="nofollow">http://thehackernews.com/</a> are on top.<p>I don't think it's possible to get a real "invariant" result page. It all depends on which computer you use (cookies, language setting, ip address).
Bottom line is that if you don't trust Google & use their tools (GWT) then you can end up in sticky situations like this one.<p>My experience with the Crawl Rate feature via GWT is that they do honour it pretty strictly, but for large sites Gbot can cause a lot of extra load even if pages are static.<p>A good CDN and stateless cache server will help but for sites as large as HN every request adds up!
I think it makes a a lot of sense for Matt Cutts to intervene in this case since:<p>1) Matt browses HN.
2) HN is a high-volume site and whatever suggestions that were discussed and implemented here can be noted and learned by everyone else.
I definitely didn't expect to see my own blog as the last result on page one. Or is that because I've shared it via Google Plus and it's being injected into my personal results?
This is an example of why Goog's search algorithms (and others') should be open: <a href="http://news.ycombinator.com/item?id=3268371" rel="nofollow">http://news.ycombinator.com/item?id=3268371</a><p>A subtle attack may be by making bots stop indexing it or using SEO practices to lower it enough so it would become unsearchable, and therefore, non-existent.<p>Or just crack into Google...