There is an easy way to increase your detection of redirects and parked pages: make two requests, one to the real URL and one to a URL which is intentionally broken. (example.com/i-am-a-link and example.com/fklsdfasdifo for example) Run a heuristic for difference on the resulting content. This won't catch all of them, particularly if you use a really naive heuristic that can't deal with e.g. ads changing, but it's a heck of a lot quicker than comparing manually.
"Links appear to die at a steady rate (they don't have a half life), and you can expect to lose about a quarter of them every seven years."<p>Is this self-contradictory, or is just poor wording of his findings?
I think it's about time that some government or billionaire throws a few millions at an internet archive project. The Internet Archive is nice but more regular snapshots with a wider coverage would be something I'm certain future historians would love to get their hands on (and they will hate us if we don't do it).
Sadly I think technological advances has only accelerated this phenomenon. We've gone from an era of static pages that would require considerable effort to change the overall layout of to CMSes that we can twiddle and upgrade with nary a concern for backward link compatibility.<p>Personally I think it should be a principle of every professional web developer that you just don't break links, period.
Users may prune their own bookmarks when they discover the links broken – especially when considering some of the pre-Pinboard systems (like in-browser bookmarking) from which the earliest data in this analysis comes. So I suspect this underestimates link-rot.