With years of practice SEO engineers have become quite good at gaming PageRank. So by itself PageRank is not a good signal of quality anymore. This is by no means a revelation. But what really amused me when I tried computing PageRank on sample webgraph was that it had become such an awesome indicator of porn! For some reason I had not seen that coming.<p>Some of the easy to exploit loop holes of PageRank give you a glimpse of the nature of the internet a decade ago. Compared to other link-analysis based indices that were prevalent in those early search-engine days, Pagerank was quite robust against spam. But it had this one loophole - "free rank" that all pages on the web received.<p>One way to view PageRank is to think of pages as participants in an economy. They pages paid a flat rate 15% tax to the powers-that-be and the remaining they had to pay it forwards via their outlinks. The powers-that-be would collect all the tax and then re-share the collection equally among all the pages. And that "equal" sharing turns out to be a loop hole now.<p>Unlike a decade ago, creating a page (and thereby getting free PageRank tokens) comes practically for free. One can create thousands upon thousands of pages dynamically at zero marginal cost, if not millions. Now one can suck in as large a share of the tax pool as one wants by just creating a large farm of pages, limited only by the size of your farm. Well, this works only if PageRank is computed by the book, in reality of course it isn't.<p>Given how well Pagerank must have worked during the early days of the internet you can get appreciation of how valuable a page on the internet was in those times.
A practical outcome of the <i></i> bound (that the PageRank change for a new link is bounded by the PageRank of the source of the link) was the WordPress link selling affair from a few years ago.<p>Every WP blog had a link at the bottom: "Proudly powered by WordPress" - which meant there were lots of in-links to WordPress's main site. Links from WordPress were therefore very influential, and the WP admins sold links to SEO shops for a tidy sum. There was some outcry when this was discovered, as I recall.
One part of pagerank which I've always been confused about is how it factors Google itself into the model. When computing pagerank would Google add a node for its own domain (i.e., google.com and all its search result pages) that points to every page on the web? They must realize that they themselves are a major driver of traffic on the web and that would affect the assumptions of the random surfer model (i.e., a random surfer would go back to google to do new searches occasionally). Doesn't this become a chicken and egg problem?