TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to become a pirate archivist

579 点作者 pilimi_anna超过 2 年前

14 条评论

jancsika超过 2 年前
I&#x27;m curious how Sci-hub&#x27;s approach compares to the What.cd&#x2F;Redacted approach.<p>IIUC Sci-hub has scooped up science docs through a good enough UX that it was able to leverage the goodwill of science folks to upload docs (plus whatever other methods it has used to scoop up docs), and it uses a public blitzkrieg-style distribution mechanism. I.e., I guess if one had a big enough harddrive and a fast enough internet connection, one could start downloading the lib right now and see if they win the race against the copyright holders.<p>On the other hand, the What.cd&#x2F;Redacted approach seems to use Bittorrent ratios to create a private-tracker economy. New users get a few gigs free download on joining. But apparently because a) there&#x27;s a 1:1 upload&#x2F;download ratio, and b) a few first-mover fat cats are sitting on enormous ratios, this means there is a scramble by everyone else to upload new FLACs to build up their ratio so they can continue to be able to download FLACs. It seems that would mean the library-in-its-entirety cannot be easily replicated at will. Yet the tracker was apparently already nuked off the internet as What.cd and reappeared later as Redacted. Was any data lost between the two services?<p>Oh yeah, there&#x27;s also apparently another approach in rutracker, which seems to be blitzkrieg to add content <i>and</i> publish, at the (apparent?) cost of quality of content.<p>It&#x27;s really a shame that the nerdy, completist domain of digital archiving through torrents isn&#x27;t covered by fair use. Perhaps we could exclude the most recent 10 years of music so that the hopeful young musician streamers can get paid a few hundred dollars for millions of streams and then receive the silver lining of fair use protection against a label refusing to release one of their albums.
评论 #33242498 未加载
评论 #33244745 未加载
评论 #33242692 未加载
评论 #33242109 未加载
评论 #33244674 未加载
评论 #33242897 未加载
评论 #33245589 未加载
评论 #33243333 未加载
ynno超过 2 年前
I think Alexandra Elbakyan actually did not want to be revealed as the librarian behind Sci-Hub, it was her poor opsec that led to her being identified.<p>Basically her servers were set up to emit detailed error messages from PHP, including full path of faulting source file, which was under directory &#x2F;home&#x2F;ringo-ring, which could be traced to a username she had online on an unrelated site, attached to her real name. Before this revelation, she was anonymous.
评论 #33242016 未加载
评论 #33241486 未加载
评论 #33241848 未加载
评论 #33241915 未加载
imhoguy超过 2 年前
As an active hoarder I think there is problem with &quot;6. Distribution: Packaging it up in torrents, announcing it somewhere, getting people to spread it.&quot;.<p>I miss a p2p application with torrent packaging and Kademlia like per-file advertising and discovery, where I could point it to my hefty NAS directory of random things and they could be wired to released torrents. This way we could make torrents live much longer, even partially complete. In super extra option the app could even notify me to load DVD because somebody asked for a file which I indexed and advertised previously.<p>For years my program preferences changed, file locations changed - I have moved files around, made them offline, burned on DVDs, deleted some parts of torrent just to keep interesting stuff. Now these torrents are lost, at least my seeding contribution is gone. But I almost never change the content of these files, their checksum stays the same forever, so they could be still discoverable.<p>The digital preservation needs better distribution system.
评论 #33243935 未加载
评论 #33244219 未加载
评论 #33244216 未加载
评论 #33249772 未加载
mdaniel超过 2 年前
I&#x27;m not in the pirate archivist space, but sections 3 and 5 are relevant to my interests. I&#x27;ve had great luck with ZAP (<a href="https:&#x2F;&#x2F;github.com&#x2F;zaproxy&#x2F;zaproxy#readme" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;zaproxy&#x2F;zaproxy#readme</a>) glued to a copy of Firefox <i>(because it allows monkeying with the _browser_&#x27;s proxy without having to alter the system one as other browsers do)</i> for archiving all content seen while surfing around a site. It even achieves the stated goal of preserving the HTML (etc) in a database since ZAP uses hsqldb<p>Then, section 5 reads like an advertisement for Scrapy since it is just stellar at following all pagination links and then either emitting the extracted payload as your own data structure and&#x2F;or by telling Scrapy you want to download some media as-is. It will, by default, put the local content in a directory of your choice and hash the url to make the local filename. A separate json file serves as the &quot;accounting&quot; between the things it downloaded and their hashed on-disk filename<p>Scrapy is also able to glue 3 and 5 together because it has a pluggable <i>(everything, heh)</i> dupe detection hook and also HTTP cache support that can be backed by anything, including the aforementioned hsqldb operating in network mode. Scrapy is also very test friendly, since each method accepts a well known python object and emits either a follow-on request, zero or more extracted objects, or nothing if pagination has ended. Thus, one need not rerun the whole scraping process just to test if a bug has been fixed, or during development<p>I can appreciate there may be other scraping frameworks, but of the ones I&#x27;ve tried Scrapy makes everything that I&#x27;ve asked it to do simple and transparent
评论 #33244524 未加载
yamrzou超过 2 年前
«That secrecy, however, comes with a psychological cost. Most people love being recognized for the work that they do, and yet you cannot take any credit for this in real life.»<p>Feels like the anonymous torrent seeder who keeps seeding a file for years just for the sake of keeping it alive. It&#x27;s not easy, but some people seem to be able to derive full pleasure from accomplishing the task itself, whether recognition happens or not.
评论 #33243476 未加载
评论 #33243457 未加载
评论 #33241433 未加载
O__________O超过 2 年前
&gt;&gt; That secrecy, however, comes with a psychological cost.<p>Being acknowledged for someone concerned about OpSec is minor, if not completely unimportant issue. Grind of maintaining OpSec for most is mind numbing in my experience, especially over an extended duration. One minor slip ends it all - and risk of slipping increases relative significance of the related operations, since more eyes increases odds someone will notice something, they’ll be forced into unfamiliar situations, etc.<p>Beyond that, research shows that odds of being discovered grow as more people know:<p><a href="https:&#x2F;&#x2F;www.bbc.com&#x2F;news&#x2F;science-environment-35411684.amp" rel="nofollow">https:&#x2F;&#x2F;www.bbc.com&#x2F;news&#x2F;science-environment-35411684.amp</a>
chatterhead超过 2 年前
While reading this I realized that the first impression for &#x27;Pirate Archivists&#x27; that I was exposed to were the bums in Fahrenheit 451 who memorize books so they can&#x27;t be burned.<p>I never realized that was my first true introduction to piracy. Really enjoyed the write up!
评论 #33243445 未加载
mrfinn超过 2 年前
I keep feeling we shouldn&#x27;t accept the term &quot;piracy&quot; anymore. The problem, the big problem is on the so-called &quot;legal&quot; side, and the purpose of this system is not about retrieving authors anymore, is about some big economic groups hoarding goods (and power by doing that). But that&#x27;s heavily against the common interest. I met quite a few years ago with a member of my country&#x27;s senate with a solid proposition to end the &quot;piracy&quot; problem. Got an email asking for more info about my proposal. That was the end of it.<p>PS. Maybe instead &quot;pirates&quot;, we should call ourselves &quot;keepers&quot;.
评论 #33244651 未加载
MasJ超过 2 年前
As the founder of emuparadise some 22 years ago, I can relate. I got into retrogames because I never got to play those games growing up in India. I thought, well let me archive these games and make them available for everyone else to play.<p>It was wildly successful. At it&#x27;s zenith EmuParadise was ranked 700 or so as per Alexa on the entire internet. We&#x27;re talking millions of visitors per day and thousands of active users every single second. I ran it all by myself with an entire team of moderators, contributors, etc.<p>It did have ads. Heck, our server bills were in the range of tens of thousands of dollars a month. How could I pay for that without having ads on the site? Then we&#x27;re in commercial copyright infringement territory. Basically if you get sued, you can go to prison, and you will be bankrupted for sure. At the time there were no torrents, no IPFS, no distributed hosting solutions in any case.<p>As time went by the stress became enormous. Of course threatening letters and DMCA takedown notices were the norm. And the fact that the site was hugely popular and government agencies such as the FBI could get involved at the behest of Nintendo et al just made it worse. But also keeping it online, through various CDNs, trying to keep it anonymously run at all times (my OpSec was terrible starting out, it started in the year 2000), keeping servers online and uptime to almost 100% and bandwidth flowing and hard drives spinning and RAID arrays working. It was a whole lot of everything all at once and I was just one guy doing it all.<p>After another website Loveroms got sued by Nintendo in 2018 (for $12MM) I decided I had had enough. Reading stories like the kickasstorrents guy getting arrested while on holiday with his wife and kid, loveroms getting sued, I decided that this was the end of the road for me. I pulled all the games from the site. Eighteen years of work down the drain.<p>My mental health had suffered tremendously, I was depressed and anxious almost all the time. The sight of a police officer on the street would set me panicking. The cost was too high.<p>Was it a blast? Oh yes it was. I used to receive thousands of emails from grateful people. Cancer patients who reminisced in their last days playing video games from their childhood, soldiers at war whose only escape was a few rounds of Bomberman (the irony is not lost on me), and so many more beautiful stories of nostalgia and connection.<p>But current copyright law is going to destroy all this art and culture. There is no real legal way to preserve it. And people like me may do it for a long while, but at what cost to ourselves? I firmly believe that a 7-10 year copyright (extendible even somehow? debatable) would be fair and would let authors get what they need out of their creations. It would help us preserve all this beautiful art and culture that we have enjoyed and share it with future generations.<p>I would love for a human kid living on a distant exoplanet in the far future be able to play Chrono Trigger and wonder about the history of the earth and our stories.
评论 #33243977 未加载
评论 #33244050 未加载
the-printer超过 2 年前
Feels as though an organization such as this should have more domain appropriate points of contact than Twitter or Reddit.<p>A very interesting thing nonetheless.
justshowpost超过 2 年前
I just dropped here to praise archivists and their merit in general. I treasure content (regardless of its perceived quality) preservation much more than legal or even ethical problems associated with it.<p>Anecdote: Remember when Microsoft Corp. declared what they love open source software and launched CodePlex platform, and then lost their business interest in it (when they bought GitHub) so they completely erased CodePlex archives? I was able to reach several long forgotten project I was interested in thanks to invaluable work of independent volunteer archivists. (It was quite tough manual job for me, I had to d&#x2F;l database then locate desired archive segment and only then could transfer required files via bittorrent proto)
standup超过 2 年前
Having a local copy of an entire ebook archive is one way we can find information without having to use the Internet. Thus we can avoid being subjected to mass surveillance, which is excellent. I wonder if the archive is full text searchable?<p>Finally an alternative to this Orwellian nightmare we call the Internet. Can&#x27;t wait to have a copy at home, and there will probably be times where I&#x27;ll be pulling the plug on the router with relief. And it&#x27;s one more step towards reducing my Internet usage, thus keeping the government and corporations out of my life.
tenacious_tuna超过 2 年前
This article has me curious as to most people&#x27;s &quot;op-sec&quot; around personal piracy practices, e.g. torrenting. Do people take requests from family members? How restrictive are you with these behaviors, especially when backed by something like Plex (which presumably just directly erodes any other opsec you may be practicing).
pwdisswordfish0超过 2 年前
Would you be open to using an apostrophe ’ in your header instead of the straight single quote? I dig the Comic Sans, but it would look so much better. Cheers lol