TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why I link to Wayback Machine instead of original web content

578 点作者 puggo超过 4 年前

75 条评论

bartread超过 4 年前
I&#x27;m not sure I&#x27;m a fan of this because it just turns WayBackMachine into another content silo. It&#x27;s called the world wide web for a reason, and this isn&#x27;t helping.<p>I can see it for corporate sites where they change content, remove pages, and break links without a moment&#x27;s consideration.<p>But for my personal site, for example, I&#x27;d much rather you link to me directly rather than content in WayBackMachine. Apart from anything else linking to WayBackMachine only drives traffic to WayBackMachine, not my site. Similarly, when I link to other content, I want to show its creators the same courtesy by linking directly to their content rather than WayBackMachine.<p>What I can see, and I don&#x27;t know if it exists yet (a quick search suggests perhaps not), is some build task that will check all links and replace those that are broken with links to WayBackMachine, or (perhaps better) generate a report of broken links and allow me to update them manually just in case a site or two happen to be down when my build runs.<p>I think it would probably need to treat redirects like broken links given the prevalence of corporate sites where content is simply removed and redirected to the homepage, or geo-locked and redirected to the homepage in other locales (I&#x27;m looking at you and your international warranty, and access to tutorials, Fender. Grr.).<p>I also probably wouldn&#x27;t run it on every build because it would take a while, but once a week or once a month would probably do it.
评论 #24408350 未加载
评论 #24406746 未加载
评论 #24407422 未加载
评论 #24406569 未加载
评论 #24407939 未加载
评论 #24407308 未加载
评论 #24407463 未加载
评论 #24407871 未加载
评论 #24407226 未加载
评论 #24407012 未加载
评论 #24407884 未加载
评论 #24412378 未加载
评论 #24415088 未加载
评论 #24407662 未加载
评论 #24407294 未加载
markjgraham超过 4 年前
We suggest&#x2F;encourage people link to original URLs but ALSO (as opposed to instead of) provide Wayback Machine URLs so that if&#x2F;when the original URLs go bad (link rot) the archive URL is available, or to give people a way to compare the content associated with a given URL over time (content drift)<p>BTW, we archive all outlinks from all Wikipedia articles from all Wikipedia sites, in near-real-time... so that we are able to fix them if&#x2F;when they break. We have rescued more than 10 million so far from more than 30 Wikipedia sites. We are now working to have Wayback Machine URLs added IN ADDITION to Live Web links when any new outlinks are added... so that those references are &quot;born archived&quot; and inherently persistent.<p>Note, I manage the Wayback Machine team at the Internet Archive. We appreciate all your support, advice, suggestions and requests.
评论 #24411970 未加载
评论 #24411926 未加载
评论 #24412972 未加载
评论 #24413044 未加载
评论 #24415643 未加载
评论 #24411864 未加载
评论 #24412679 未加载
bherb超过 4 年前
Here, I fixed your link: <a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20200908090515&#x2F;https:&#x2F;&#x2F;hawaiigentech.com&#x2F;post&#x2F;commentary&#x2F;why-i-link-to-waybackmachine-instead&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20200908090515&#x2F;https:&#x2F;&#x2F;hawaiigen...</a>
评论 #24409654 未加载
outsomnia超过 4 年前
This is a bad idea...<p>In the worst case one might write a cool article and get two hits, one noticing it exists, and the other from the archive service. After that it might go viral, but the author may have given up by then.<p>The author is losing out on inbound links so google thinks their site is irrelevant and gives it a bad pagerank.<p>All you need to do is get archive.org to take a copy at the time, you can always adjust your link to point to that if the original is dead.
评论 #24406662 未加载
评论 #24406550 未加载
评论 #24407912 未加载
评论 #24406559 未加载
评论 #24406935 未加载
评论 #24407687 未加载
CaptArmchair超过 4 年前
So, this is the problem of persistence of URL&#x27;s always referencing the original content, regardless of where it is hosted, in an authoritative way.<p>It&#x27;s an okay idea to link to WB, because (a) it&#x27;s de facto assumed to be authoritative by the wider global community and (b) as an archive it provides a promise that it&#x27;s URL&#x27;s will keep pointing to the archived content come what may.<p>Though, such promises are just that: promises. Over a long period of time, no one can truly guarantee the persistence of a relationship between an URI and the resource it references to. That&#x27;s not something technology itself solves.<p>The &quot;original&quot; URI still does carry the most authority, as that&#x27;s the domain on which the content was first published. Moreover, the author can explicitly point to the original URI as the &quot;canonical&quot; URI in the HTML head of the document.<p>Moreover, when you link to the WB machine, what do you link to? A specific archived version? Or the overview page with many different archived versions? Which of those versions is currently endorsed by the original publisher, and which are deprecated? How do you know this?<p>Part of ensuring persistence is the responsibility of original publisher. That&#x27;s where solutions such as URL resolving come into play. In the academic world, DOI or handle.net are trying to solve this problem. Protocols such as ORE or Memento further try to cater to this issue. It&#x27;s a rabbit hole, really, when you start to think about this.
评论 #24406678 未加载
评论 #24407138 未加载
ffpip超过 4 年前
You can create a bookmark in Firefox to save a link quickly.<p>Bookmark Location- <a href="https:&#x2F;&#x2F;web.archive.org&#x2F;save&#x2F;%s" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;save&#x2F;%s</a><p>Keyword - save<p>So searching &#x27;save <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24406193" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24406193</a>&#x27; archives this post.<p>You can use any Keyword instead of &#x27;save&#x27;.<p>You can also search with <a href="https:&#x2F;&#x2F;web.archive.org&#x2F;*&#x2F;%s" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;*&#x2F;%s</a>
评论 #24407398 未加载
评论 #24406677 未加载
kibibu超过 4 年前
Can we update this link to point to the archive version?
评论 #24411082 未加载
imhoguy超过 4 年前
This is building yet another silo and point of failure. We can&#x27;t pass the entire Internet traffic thru WayBackMachine as its resources are limited.<p>Most preserving solutions are like that and at the end the funding or business priorities (google groups) become a serious problem.<p>I think we need something like web - distributed and dumb easy to participate and contribute a preservation space.<p>Look, there are Torrents available for 17 years [0]. Sure, there are some unintresting long gone but there is always a little chance somebody still has the file and someday becomes online with it.<p>I know about IPFS&#x2F;Dat&#x2F;SBB, but still that stuff, like Bitcoin, is too complex for a layman contributor with a plain altruistic motivation. It should be like SETI@Home - fire and forget. Eventually integrated with a browser to cache content you star&#x2F;bookmark and share when it is offline.<p>[0] <a href="https:&#x2F;&#x2F;torrentfreak.com&#x2F;worlds-oldest-torrent-still-alive-after-15-years-180929&#x2F;" rel="nofollow">https:&#x2F;&#x2F;torrentfreak.com&#x2F;worlds-oldest-torrent-still-alive-a...</a>
mountainb超过 4 年前
Link rot has convinced me that the web is not good for its ostensible purpose. I used to roll my eyes reading how academic researchers and librarians would discourage using webpages as resources. Many years later, it&#x27;s obvious that the web is pretty bad for anything that isn&#x27;t ephemeral.
评论 #24407853 未加载
评论 #24407440 未加载
评论 #24407701 未加载
codetrotter超过 4 年前
By that reasoning, shouldn’t you be be using WayBack Machine links when posting your own content to HN, instead of posting direct links?
cornedor超过 4 年前
But how certain is the future of WayBackMachine, when disaster strikes, all your links are dead. On the other hand, the original links can still be read from the url, so the original reference is not completely gone.
评论 #24406459 未加载
评论 #24406589 未加载
评论 #24406476 未加载
评论 #24409963 未加载
评论 #24406398 未加载
romwell超过 4 年前
Good idea, by why not both (i.e. link to a webpage, <i>and</i> to the Archive)?<p>Linking to Archive only makes Archive a single point of failure.
评论 #24406563 未加载
评论 #24406752 未加载
评论 #24406567 未加载
评论 #24410605 未加载
评论 #24406538 未加载
NateEag超过 4 年前
I understand where the author is coming from, but I think the best approach is to write your content with direct links to the canonical versions of articles.<p>Have a link checking process you run regularly against your site, using some of the standard tools I&#x27;ve mentioned elsewhere in this thread:<p><a href="https:&#x2F;&#x2F;www.npmjs.com&#x2F;package&#x2F;broken-link-checker-local" rel="nofollow">https:&#x2F;&#x2F;www.npmjs.com&#x2F;package&#x2F;broken-link-checker-local</a><p><a href="https:&#x2F;&#x2F;linkchecker.github.io&#x2F;linkchecker&#x2F;" rel="nofollow">https:&#x2F;&#x2F;linkchecker.github.io&#x2F;linkchecker&#x2F;</a><p>When you run the link check (which should be regularly, perhaps at least weekly), also run a process that harvests the non-local links from your site and 1) adds any new links&#x27; content to your own local, unpublished archive of external content, and 2) submits those new links to archive.org.<p>This keeps canonical URLs canonical, makes sure content you&#x27;ve linked to is backed up on archive.org so a reasonably trustworthy source is available should the canonical one die out, and gives you your own backup in case archive.org and the original both vanish.<p>I don&#x27;t currently do this with my own sites, but now I&#x27;m questioning why not. I already have the regular link checks, and the second half seems pretty straightforward to add (for static sites, anyway).
susam超过 4 年前
I think the fundamental problem here is that URLs locate resources. We find the desired content by finding its location given by an address. Now what server or content lives on that address may change from time to time or may even disappear. This leads to broken links.<p>The problem with linking to Wayback Machine is that we are still writing archive.org URLs still linking to Wayback Machine servers. What guarantee is there that those archive.org links will not break in future?<p>It would have been nice if the web were designed to be content-addressable. That is, the identifier or string we use to access a content addresses the content directly, not a location where the content lives. There is good effort going on in this area in the InterPlanetary File System (IPFS) project but I don&#x27;t think the mainstream content providers on the Internet are going to move to IPFS anytime soon.
yreg超过 4 年前
I&#x27;m all for Archive.org. However, using it in this way — setting up a mirror of some content and purposefuly diverting traffic to said mirror — is copyright infringement (freebooting), as it competes with the original source.
j1elo超过 4 年前
This is a bad idea for the reasons that other commenters have already stated. If WayBackMachine falls, all links would fall. Actually the &quot;Web&quot; would stop being one, if all links are all within the same service.<p>For docs and other texts, I just link to the original site and add an (Archive) suffix, e.g. the &quot;Sources&quot; section in <a href="https:&#x2F;&#x2F;doc-kurento.readthedocs.io&#x2F;en&#x2F;latest&#x2F;knowledge&#x2F;nat.html#nat-types-and-nat-traversal" rel="nofollow">https:&#x2F;&#x2F;doc-kurento.readthedocs.io&#x2F;en&#x2F;latest&#x2F;knowledge&#x2F;nat.h...</a><p>That is a simple and effective solution, yes it is a bit more cumbersome, but it does not bother me.
asdfman123超过 4 年前
&gt; So in Feb 14 2019 your users would have seen the content you intended. However in Sep 07 2020, your users are being asked to support independent Journalism instead.<p>Can you believe it? Yesterday, I tried to walk out of the grocery store with a head of lettuce for free, and they instead were more interested in making me pay money to support the grocery and agricultural business!
评论 #24410699 未加载
koboll超过 4 年前
This seems like a problem that would be better solved by something like:<p>1. Browsers build in a system whereby if a link appears dead, they first check against the Wayback Machine to see if a backup exists.<p>2. If it does, they go there instead.<p>3. In return for this service, and to offset costs associated with increased traffic, they jointly agree to financially support the Internet Archive in perpetuity.
aldo712超过 4 年前
Here&#x27;s a WayBackMachine Link to this article. :)<p><a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20200908090515&#x2F;https:&#x2F;&#x2F;hawaiigentech.com&#x2F;post&#x2F;commentary&#x2F;why-i-link-to-waybackmachine-instead&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20200908090515&#x2F;https:&#x2F;&#x2F;hawaiigen...</a>
dltj超过 4 年前
Take a look at _Robustify Your Links_.[1] It is an API and a snippet of JavaScript that saves your target HREF in one of the web archiving services and adds a decorator to the link display that offers the option to the user to view the web archive.<p>[1] <a href="https:&#x2F;&#x2F;robustlinks.mementoweb.org&#x2F;about&#x2F;" rel="nofollow">https:&#x2F;&#x2F;robustlinks.mementoweb.org&#x2F;about&#x2F;</a>
wolco超过 4 年前
No one touched on this but the experience of viewing through the waybackmachine is awful.<p>Media many times will not be saved so pages look broken. The iframe and the iframe breakers on original sites can kill any navigating.<p>The waybackmachine is okay for researching but a poor replacement as a perm link.
评论 #24411261 未加载
shortformblog超过 4 年前
This man’s entire argument is completely terrible for two reasons:<p>1) The example he uses is The Epoch Times, a questionable source even on the best of days.<p>2) What he refers to as “spam” is a paywall. He is literally taking away from business opportunities for this outlet that produced a piece of content he wants to draw attention to, but he does not want to otherwise support.<p>He’s a taker. And while the Wayback Machine is very useful for sharing archived information, that’s not what this guy is doing. He’s trying to undermine the business model of the outlets he’s reading.<p>The Epoch Times is one thing—it’s an outlet that is essentially propaganda—but when he does this to a local newspaper or an actual independent media outlet, what happens?
评论 #24407795 未加载
评论 #24409141 未加载
celsoazevedo超过 4 年前
Is there any WordPress plugin that adds a link to the WayBack Machine next to the original link? I would use something like that.
评论 #24406463 未加载
评论 #24406602 未加载
wila超过 4 年前
The idea of being able to access the URL once it is gone is good. However this also means that any updates made to the original page are no longer seen.<p>Not all updates are about &quot;begging for money&quot; as the example in the article.
nikisweeting超过 4 年前
Or link to your own archive of the content with ArchiveBox!<p>That way we&#x27;re not all completely reliant on a central system. (ArchiveBox submits your links to Archive.org in addition to saving them locally).<p><a href="https:&#x2F;&#x2F;github.com&#x2F;pirate&#x2F;ArchiveBox" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pirate&#x2F;ArchiveBox</a><p>Also many other tools that can do this too:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;pirate&#x2F;ArchiveBox&#x2F;wiki&#x2F;Web-Archiving-Community#other-archivebox-alternatives" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pirate&#x2F;ArchiveBox&#x2F;wiki&#x2F;Web-Archiving-Comm...</a>
icemelt8超过 4 年前
Just FYI, archive.org is banned in a few countries, including the UAE, where I cannot open any links from there.
评论 #24410087 未加载
krapp超过 4 年前
Apropos of nothing but I added the ability to archive links in Anarki a few months back[0]. If dang or someone wants to take it for HN they&#x27;re welcome to. Excuse the crappy quality of my code and pr format, though.<p>It might be useful as a backup if the original site starts getting hugged to death.<p>[0]<a href="https:&#x2F;&#x2F;github.com&#x2F;arclanguage&#x2F;anarki&#x2F;pull&#x2F;179" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;arclanguage&#x2F;anarki&#x2F;pull&#x2F;179</a>
fornowiamhere超过 4 年前
&gt; <i>Now it’s spam from a site suffering financial need.</i> Well, yeah!<p>Of course, linking to WBM is not the main reason why a site might be in this situation but it piles up.
hownottowrite超过 4 年前
Awesome. Hey, mods... Can you change the link on this post to <a href="http:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20200908090515&#x2F;https:&#x2F;&#x2F;hawaiigentech.com&#x2F;post&#x2F;commentary&#x2F;why-i-link-to-waybackmachine-instead&#x2F;" rel="nofollow">http:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20200908090515&#x2F;https:&#x2F;&#x2F;hawaiigent...</a>
stratigos超过 4 年前
I link to WayBackMachine as Ive built a great many greenfield applications for startups as a freelancer, which only existed for about 6-8 months before hitting their burn rate. If I linked to their original domains, my portfolio would be a list of 404s.
rmoriz超过 4 年前
I once discovered an information leak of German public broadcasting organization ARD which leaked real mobile numbers on their CI&#x2F;CD page where they showed the business card designs (lol).<p>All records of this page on Archive.org were deleted after a couple of days, a twitter account posting the details with a screenshot and link was reported and my account temporarily suspended.<p>I assume it must be very easy to remove inconvenient content from archive.org.<p>(in German) <a href="https:&#x2F;&#x2F;blog.rolandmoriz.de&#x2F;2019&#x2F;04&#x2F;25&#x2F;sind-die-leute-von-der-ard-so-doof&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.rolandmoriz.de&#x2F;2019&#x2F;04&#x2F;25&#x2F;sind-die-leute-von-de...</a>
runxel超过 4 年前
While I certainly wouldn&#x27;t do this with every page and also not every time, I got so anxious of link rot lately I save out of reflex any good content I come across to the Waybackmachine.<p>The use of the bookmarklet makes this really convenient.
AnonHP超过 4 年前
WayBackMachine is slow (slower than many bloated websites). So it’s not a good enough experience for the person clicking on that link.<p>Secondly, I personally don’t like the fact that WayBackMachine doesn’t provide an easy way to get content removed and to stop indexing and caching content (the only way I know is to email them, with delayed responses or responses that don’t help). It’s far easier to get content de-indexed in the major search engines. I know that the team running it have some reasons to archive anything and everything (as) permanently (as possible), but it doesn’t serve everybody’s needs.
euske超过 4 年前
This is both good and scary idea: for the good part, I&#x27;m frustrated enough that some unscrupulous websites (even some news outlets) secretly alter their contents without mentioning the change. I want a mechanism that holds the publisher responsible. At the same time, this is scary because we&#x27;re basically using one private organization a single arbitrator. (I know it&#x27;s a nonprofit, but they&#x27;re probably not as public as a government entity.) Maybe it&#x27;s good for the time being, but we should be aware that this is a solution that&#x27;s far from perfect.
评论 #24406966 未加载
cpcallen超过 4 年前
This seems like a risky strategy, what with the pending lawsuit against archive.org over their National Emergency Library: I am fully expecting that web.archive.org will go away permanently within a few years.
rkagerer超过 4 年前
I link to the original, but archive it in both WayBackMachine and Archive.is.
uniqueid超过 4 年前
Yeah, that&#x27;s another problem with the design of the web, and kind of a significant one! Somewhat pointless to link to external documents when half of them won&#x27;t be around next year.
luord超过 4 年前
While I generally disagree because I&#x27;d rather my site was the one getting the hits—and I would rather give the same courtesy to other authors—this does give me the idea of checking (or creating if none exists) an archive link of whatever I reference, and include that archive link in the metadata of every link I include.<p>Users will find the archive link if they really want to, and it will make it easier for me to replace broken links in the future.
woko超过 4 年前
As others mentioned, it is a good habit to request the page to be archived. You don&#x27;t have to link to the archive, but you would have the option to if the page were to disappear in the future.<p>I wish I had done this 15 years ago for a small project&#x2F;website. Nowadays, my website is there, with all of its content, but most of the awesome references which I had linked to are unavailable. I wrote &quot;most&quot;, but it is close to all of them.
8bitsrule超过 4 年前
Gotta completely agree ... for anything you need to be stable and available.<p>I&#x27;ve been building lists of -reference- URLs for over a decade ... and the ones aimed at Archive.org (are slower to load, but) are much more reliable.<p>Saved Wayback URLs contain the original site URL. It&#x27;s really easy to check it to see if the site has deteriorated (usually it has). If it&#x27;s gotten better ... it&#x27;s easy to update your saved WB link.
jakeogh超过 4 年前
If it&#x27;s not distributed, it is going to disappear.<p>The waybackmachine is backed by WARC files. It&#x27;s perhaps the only thing on archive.org that cant be downloaded... well except the original mpg files for 911 news footage.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20623177" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20623177</a>
samatman超过 4 年前
This is such a fundamental problem that I&#x27;d like to be able to solve it at the HTML level.<p>An anchor type which allows several URLs, to be tried in order, would go a long way. Then we could add automatic archiving and backup links to a CMS.<p>It isn&#x27;t real content-centric networking, which is a pity, but it&#x27;s achievable with what we have.
ffpip超过 4 年前
The wayback machine has helps me on a daily basis. So many old links are dead.<p>The other day, I noticed that even old links from the front page of Google and Youtube are dead now. Internet Archive still has them. These were links on the front page of YT. Was very disappointed that even Google has dead links.
ashishb超过 4 年前
I wrote a link checker[1] to detect outbound links and mark dead links, so that, I can replace them manually with archive.org links.<p>1 - <a href="https:&#x2F;&#x2F;github.com&#x2F;ashishb&#x2F;outbound-link-checker" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ashishb&#x2F;outbound-link-checker</a>
spqr233超过 4 年前
I made a chrome extension called Capsule that works perfectly for this use case. With just a click, you can create a publically shareable link that preserves the webpage exactly as you see it in your browser.<p><a href="https:&#x2F;&#x2F;capsule.click" rel="nofollow">https:&#x2F;&#x2F;capsule.click</a>
评论 #24409981 未加载
nullandvoid超过 4 年前
I experienced this just the other day.<p>I was browsing an old HN post from 2018, with lots of what seemed like useful links to their blog<p>Upon visiting it the site had been rebranded and the blog entries had disappeared<p>Waybackmachine saved me in this cass, but a link to it originally would have saved me a few clicks
Cthulhu_超过 4 年前
If it&#x27;s to actually reference a third party source, it&#x27;s probably better to make a self-hosted copy of the page. You can print it to a PDF file for example. I don&#x27;t believe archive.org is eternal, or that its pages will remain the same.
m-p-3超过 4 年前
I still link to the original URL because the author deserves the ad revenue and traffic, but I archive a copy to the Wayback Machine just in case the website can&#x27;t handle the load, so there is an alternative way of getting the content.
tannhaeuser超过 4 年前
The proper way is for a site to expose a canonical link to an article via a meta-link (rel=canonical) if necessary, and then have a browser plugin to automatically try archive.org with an URL generated from the canonical one if it is down.
PhilosAccnting超过 4 年前
Thank you! I&#x27;ve only been using the labor-intensive trust-issues version of this: paraphrasing things in my own words and linking to THAT.<p>I think I&#x27;ve been curating about 200 essays so far like that. You&#x27;re now making me rethink my flow.
EllieEffingMae超过 4 年前
I maintain a Fork of a program that does exactly this! You can check it out here<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Lifesgood123&#x2F;prevent-link-rot" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Lifesgood123&#x2F;prevent-link-rot</a>
ponker超过 4 年前
What would be even cooler is if there was an easy way to turn your own server into a Wayback machine, so that when your server rendered a webpage, it would use the original link if available, or its own cached version if not.
michaelanckaert超过 4 年前
In the past I would fall back to WBM when something is no longer online. Though recently I&#x27;ve been bookmarking interesting content very rigorously and just rely on the archival feature of my bookmarking software.
ique超过 4 年前
Just another reason to have content-adressable storage everywhere, then at least if it changed you’ll know it changed, and if you can’t get the original content anymore then the change is probably malicious.
drummer超过 4 年前
For anything important you can&#x27;t beat a good save to pdf feature in the browser. You can then upload the pdf and link to that instead. Someone should make a wordpress plugin to do this automatically.
axelfreeman超过 4 年前
You could link to the original web url and also do a print version of the web content as PDF. That&#x27;s how i archive howtos and write-ups of interesting content. Print view and create a PDF version.
hgo超过 4 年前
Maybe the solution isn&#x27;t technical and we should look at other fields that have relied on referencing credible sources for a long time? I can think of research, news and perhaps law.
not2b超过 4 年前
It&#x27;s probably better to link to both. If a site corrects a story, you readers will want to see the correction, but if the page disappears, it&#x27;s good to have the backup.
andy_ppp超过 4 年前
It would be good to create a distributed, consensus version (to help stop edits) of the content rather than have a single point of failure...
scruffyherder超过 4 年前
So it can be deleted too?<p>Or so there is no engagement at the source?
LostJourneyman超过 4 年前
There&#x27;s some subtle irony in that the linked site is not in fact a WayBackMachine link, but instead a direct link to the site.
arnoooooo超过 4 年前
On the same topic, I wish I could link with highlights in the page. Having a spec for highlights in URLS would be neat.
评论 #24408761 未加载
spurgu超过 4 年前
I think a good solution might be to host the archive version yourself (archive.org is slow, and always using it centralizes everything there).<p>Let&#x27;s say you write an article on your site, <a href="https:&#x2F;&#x2F;yoursite.com&#x2F;my-article" rel="nofollow">https:&#x2F;&#x2F;yoursite.com&#x2F;my-article</a>, and from it you want to link to an article <a href="https:&#x2F;&#x2F;example.com&#x2F;some-article" rel="nofollow">https:&#x2F;&#x2F;example.com&#x2F;some-article</a><p>You then create a mirror of <a href="https:&#x2F;&#x2F;example.com&#x2F;some-article" rel="nofollow">https:&#x2F;&#x2F;example.com&#x2F;some-article</a> to be served from your site at <a href="https:&#x2F;&#x2F;yoursite.com&#x2F;mirror&#x2F;2019-09-08&#x2F;some-article" rel="nofollow">https:&#x2F;&#x2F;yoursite.com&#x2F;mirror&#x2F;2019-09-08&#x2F;some-article</a> (put &#x2F;mirror&#x2F; in robots.txt and set to noindex (or maybe even better to put a rel=&quot;canonical&quot; towards the original article?)) and on the top of this mirrored page you add a header bar thingy containing a link to the original article, as well as one to archive.org if you so want.<p>tl;dr instead of linking to <a href="https:&#x2F;&#x2F;example.com&#x2F;some-article" rel="nofollow">https:&#x2F;&#x2F;example.com&#x2F;some-article</a> you link to <a href="https:&#x2F;&#x2F;yoursite.com&#x2F;mirror&#x2F;2019-09-08&#x2F;some-article" rel="nofollow">https:&#x2F;&#x2F;yoursite.com&#x2F;mirror&#x2F;2019-09-08&#x2F;some-article</a> (which has links to the original)
zoid_超过 4 年前
I find that web archive pages always appear broken —- perhaps a lot of js or css is not properly archived?
CassSunscreen超过 4 年前
Everyone should be doing this in my opinion, articles get pulled all the time
sebastianconcpt超过 4 年前
Clever way to make the reference immutable.<p>Some blockchain will end up taking care of this.
ImAlreadyTracer超过 4 年前
Is there a chrome app that utilises waybackmachine?
LoSboccacc超过 4 年前
has waybackmachine stopped retroactively applying robots?<p>if not link to that are one misconfiguration or one parked domain from being wiped.
eruci超过 4 年前
WBM is like a content snapshot. You can&#x27;t go back in time and change anything. That&#x27;s why it is better than linking to the original.
Andrew_nenakhov超过 4 年前
Hmm. is there a place for a service that makes a permanent copy of content, available at the original url at the time of posting?
prgmatic超过 4 年前
I stopped reading after the part where they describe the paywall gated version of the journalism website as “Now it’s spam from a site suffering financial need.”<p>That website spends money creating content for commercial viability, it doesn’t have to bow to you and make sure you can consume it for free, and the Wayback Machine isn’t a tool for you to bypass premium content.
TheSpiceIsLife超过 4 年前
This behaviour should be reported to the WayBackMachine as abuse.
dirtnugget超过 4 年前
He is actually showcasing a very nice technique to get around paywalls: turn off JS. Often enough that’s enough to get around the paywall. I believe the archives also disable JS when grabbing the content.
评论 #24410079 未加载
评论 #24418359 未加载
s9w超过 4 年前
In practice however, archive.org did censor content based on political preference.
评论 #24406777 未加载
k1m超过 4 年前
I think this a good idea, but especially because the WayBackMachine uses good content security policies to prevent some of the intrusive JS ad-dependent sites like to push on people. So you&#x27;re not only protecting from future 404 scenarios, but also protecting your visitors&#x27; privacy from unscrupulous ad-tech which seems to be everywhere now.<p>The example provided in the article, showing how a site looked cleaner before, could simply be the content security policies at the WayBackMachine preventing the clutter from getting loaded, rather than any specific changes on the site - although I haven&#x27;t checked that particular site.