TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Testing 3 million hyperlinks, lessons learned

206 点作者 sathyabhat将近 13 年前

10 条评论

akent将近 13 年前
(I've had this rant before, but I'll repeat it.)<p>He points to Stack Overflow's 404 as a good example and claims "We do our best to explain it was removed, why it was removed and where you could possibly find it."<p>Yet there is still no permanent archive of deleted Stack Overflow content; you have to rely on third party archives like archive.org and even then, you have to be lucky.<p>SO moderators have a habit of retrospectively deleting old content that is off-topic under current rules, even if it was perfectly <i>on-topic</i> at some time in the past. I feel this is bad internet citizenship -- it's removing internet history for no good reason.<p>Fair enough, delete newly created off-topic questions under the current moderation rules. But when these types of questions were asked originally they <i>were</i> on topic at the time. Deleting them retrospectively (completely - no redirect either) is still poor form.<p>(Top read otherwise though!)
评论 #4079921 未加载
评论 #4078676 未加载
citricsquid将近 13 年前
&#62; It would be trivial to do some rudimentary parsing on the url string to determine where you really wanted to go<p>Specific to this point, a new project I'm building supports "pretty" URLs and I've found my (now) <i>favourite</i> solution is to build an aliases system.<p>It works like so: when a user creates an item an "alias" is registered, it's set to "current" and all future queries to that alias are logged. If the user causes a change to the URL in future (name change, etc.) then the new alias is registered but the old one is retained and 301s to the new alias. All aliases are accessible by the user and they can invalidate them manually (if they want to re-use an alias for example) <i>however</i> if an alias has had a large amount of hits from a single source <i>since</i> that alias was retired (say 50 referrals from website.com to mysite.com/previous-alias) the system assumes that the user posted the link on another website and so invalidating that alias will cause a dead link (and lose my site traffic) so it doesn't allow it.<p>I guess it's convoluted and adds extra overhead but I feel like if you have pretty URLs (which are in my opinion something that a website should aim for) you need to be in a position where they're not going to cause the site to break the rest of the internet. The easy solution is to have <i>pseudo</i> pretty URLs (eg: website.com/123-pretty-url, where 123 = ID and pretty-url is just an ignored string) or just not allow URLs to ever be changed, but I don't like either.<p>I wonder if any other websites have a good approach to this.
评论 #4078327 未加载
评论 #4078515 未加载
评论 #4078262 未加载
kiba将近 13 年前
Julian Assange on Self Destructing Paper(<a href="http://web.archive.org/web/20071020051936/http://iq.org/" rel="nofollow">http://web.archive.org/web/20071020051936/http://iq.org/</a>):<p><i>The internet is self destructing paper. A place where anything written is soon destroyed by rapacious competition and the only preservation is to forever copy writing from sheet to sheet faster than they can burn.<p>If it's worth writing, it's worth keeping. If it can be kept, it might be worth writing. Would your store your brain in a startup company's vat? If you store your writing on a 3rd party site like blogger, livejournal or even on your own site, but in the complex format used by blog/wiki software de jour you will lose it forever as soon as hypersonic wings of internet labor flows direct people's energies elsewhere. For most information published on the internet, perhaps that is not a moment to soon, but how can the muse of originality soar when immolating transience brushes every feather?</i>
评论 #4078299 未加载
评论 #4078061 未加载
simias将近 13 年前
&#62; Some sites like giving you no information in the URL<p>For me one of the worst offender in this category is youtube. I can't understand why they don't put a slug with the video name in the canonical URL (especially since they have youtu.be for shortening URLs). It's really a pain to find back an old video in, say, an IRC log with only the opaque video ID.<p>Vimeo does the same thing. Dailymotion however does put a meaningful slug.
评论 #4084063 未加载
Aissen将近 13 年前
Obligatory W3C link: Cool URIs don't change: <a href="http://www.w3.org/Provider/Style/URI.html" rel="nofollow">http://www.w3.org/Provider/Style/URI.html</a><p>(note: this page's URL didn't change since at least 1999)
评论 #4078795 未加载
Gring将近 13 年前
For mismanaged sites where the site owner changed URLs and could have added proper redirects but instead chose to just show 404s for all of them (article mentions the examples of github and java, but there are countless more), there should be a wikipedia-style community-driven reference project with better redirects. Is anybody working in this direction?
ConstantineXVI将近 13 年前
Semi-OT:<p>On the subject of GitHub's robots.txt[0], would anyone have a guess at why this particular repo[1] is singled out?<p>[0] <a href="https://github.com/robots.txt" rel="nofollow">https://github.com/robots.txt</a><p>[1] <a href="https://github.com/ekansa/Open-Context-Data" rel="nofollow">https://github.com/ekansa/Open-Context-Data</a>
评论 #4082803 未加载
mattmanser将近 13 年前
A variable called stuff? Seriously?<p>Shame as the rest of the article is quite good, but that really flags me that this is a little bit cowboy code.<p>Also interesting to read some sites are taking a 'white-list' approach to robots.txt, as he says this is resulting in people starting to ignore it.
评论 #4078267 未加载
评论 #4078413 未加载
评论 #4078158 未加载
评论 #4078525 未加载
评论 #4079021 未加载
sparknlaunch将近 13 年前
What are the common causes of broken links?<p>Seems unavoidable on large sites.
评论 #4078876 未加载
评论 #4078456 未加载
评论 #4078332 未加载
fnulp将近 13 年前
"just ignore robots.txt?"<p>how about "fuck you"? I guess it's high time to make honeypots, tarpits and bans common practice.
评论 #4078656 未加载
评论 #4078629 未加载
评论 #4078846 未加载
评论 #4080118 未加载