TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Archivists Are Trying to Make Sure LibGen Never Goes Down

908 点作者 legatus超过 5 年前

26 条评论

legatus超过 5 年前
This is an extremely important effort. The LibGen archive contains around 32 TBs of books (by far the most common being scientific books and textbooks, with a healthy dose of non-STEM). The SciMag archive, backing up Sci-Hub, clocks in at around 67 TBs [0]. This is invaluable data that should not be lost. If you want to contribute, here&#x27;s a few ways to do so.<p>If you wish to donate bandwidth or storage, I personally know of at least a few mirroring efforts. Please get in touch with me over at legatusR(at)protonmail(dot)com and I can help direct you towards those behind this effort.<p>If you don&#x27;t have storage or bandwidth available, you can still help. Bookwarrior has requested help [1] in developing an HTTP-based decentralizing mechanism for LibGen&#x27;s various forks. Those with experience in software may help make sure those invaluable archives are never lost.<p>Another way of contributing is by donating bitcoin, as both LibGen [2] and The-Eye [3] accept donations.<p>Lastly, you can always contribute books. If you buy a textbook or book, consider uploading it (and scanning it, should it be a physical book) in case it isn&#x27;t already present in the database.<p>In any case, this effort has a noble goal, and I believe people of this community can contribute.<p>P.S. The &quot;Pirate Bay of Science&quot; is actually LibGen, and I favor a title change (I posted it this way as to comply with HN guidelines).<p>[0] <a href="http:&#x2F;&#x2F;185.39.10.101&#x2F;stat.php" rel="nofollow">http:&#x2F;&#x2F;185.39.10.101&#x2F;stat.php</a><p>[1] <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;gmLB5pm" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;gmLB5pm</a><p>[2] bitcoin:12hQANsSHXxyPPgkhoBMSyHpXmzgVbdDGd?label=libgen, as found at <a href="http:&#x2F;&#x2F;185.39.10.101&#x2F;" rel="nofollow">http:&#x2F;&#x2F;185.39.10.101&#x2F;</a>, listed in <a href="https:&#x2F;&#x2F;it.wikipedia.org&#x2F;wiki&#x2F;Library_Genesis" rel="nofollow">https:&#x2F;&#x2F;it.wikipedia.org&#x2F;wiki&#x2F;Library_Genesis</a><p>[3] Bitcoin address 3Mem5B2o3Qd2zAWEthJxUH28f7itbRttxM, as found in <a href="https:&#x2F;&#x2F;the-eye.eu&#x2F;donate&#x2F;" rel="nofollow">https:&#x2F;&#x2F;the-eye.eu&#x2F;donate&#x2F;</a>. You can also buy merchandising from them at <a href="https:&#x2F;&#x2F;56k.pizza&#x2F;" rel="nofollow">https:&#x2F;&#x2F;56k.pizza&#x2F;</a>.
评论 #21694889 未加载
评论 #21694459 未加载
评论 #21694101 未加载
评论 #21693447 未加载
评论 #21694174 未加载
评论 #21743673 未加载
评论 #21694054 未加载
评论 #21693480 未加载
miki123211超过 5 年前
The new architecture of pirate sites, what I call the Hydra architecture, seems pretty interesting to me. There isn&#x27;t a single site hosting the content, but a group of mirrors freely exchanging data between one another. In case some of them go down, the other ones still remain and new ones can appear, copying data from the remaining mirrors. This is like a hydra that grows two heads every time you chop one off. It&#x27;s absolutely unkillable, as there&#x27;s no single group or server to sue.<p>A more advanced version of this architecture is used by pirate addons for the Kodi media center software. Basically, you have a bunch of completely legal and above board services like Imdb that contain video metadata. They provide the search results, the artworks, the plot descriptions, episode lists for TV shows etc. Impossible to sue and shut down, as they&#x27;re legal. Then, you have a large number of illegal services that, essentially, map IDs from websites like IMDB to links. Those links lead to websites like Openload, which let you host videos. They&#x27;re in the gray area, if they comply with DMCA requests and are in a reasonably safe jurisdiction, they&#x27;re unlikely to be shut down. On the Kodi side, you have a bunch of addons. There are the legitimate ones that access IMDB and give you the IDs, the not that legitimate ones that map IDs to URLs, and the half-legitimate ones that can actually play stuff ron those URLS (not an easy taks, as websites usually try to prevent you from playing something without seeing their ads). Those addons are distributed as libraries, and are used as dependencies by user-friendly frontends. Those frontends usually depend on several addons in each category, so, in case one goes down, all the other ones still remain. It&#x27;s all so decentralized and ownerless that there&#x27;s no single point of failure. The best you can do is killing the frontend addon, but it&#x27;s easy to make a new one, and users are used to switching them every few months.
评论 #21696758 未加载
评论 #21694934 未加载
评论 #21693971 未加载
评论 #21697838 未加载
评论 #21696603 未加载
sanxiyn超过 5 年前
Yongle Encyclopedia was a similar project of the 15th century China. It was the largest encyclopedia in the world for 600 years until surpassed by Wikipedia.<p>Alas, Yongle Encyclopedia is almost completely lost now. Archiving is harder than you think.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Yongle_Encyclopedia" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Yongle_Encyclopedia</a>
评论 #21698469 未加载
评论 #21695088 未加载
EthanHeilman超过 5 年前
Maybe we should print this out on acid-free paper-thin flexible wood-pulp sheets stitched to together to form linear organized aggregations. Each aggregation would contain one or more works and be searchable using a SQL-like database. To make this plan really work there would need to be a collection of geographically distributed long term physical repositories that would receive periodic updates as new material became available.<p>All joking aside, I do wonder wither digital or analogue formats are better able to survive into the distant future.<p>* What impact will DRM have on the accessibility of our knowledge to future historians?<p>* Is anything recoverable from a harddrive or flash media after 500 years in a landfill?<p>* Will compressed files be more of less recoverable? What about git archives?<p>* Will the future know the shape of our plastic GI Joes toys but not the content of the GI Joes cartoon?
评论 #21696995 未加载
评论 #21697710 未加载
评论 #21696279 未加载
评论 #21696459 未加载
评论 #21705444 未加载
knzhou超过 5 年前
Libgen is one of the greatest contributors to scientific productivity worldwide, possibly beaten only by Sci-Hub. Just about everybody in academia knows about it. If it ever vanished, some of us could probably still get by trading files from person to person, but nothing could be as perfect as what we got now.
评论 #21698100 未加载
评论 #21696063 未加载
评论 #21697473 未加载
turc1656超过 5 年前
I don&#x27;t see anyone having mentioned the possibility of posting this data to Usenet at all - at minimum for archival purposes which should be good for ~8-9 years. That way at least the data isn&#x27;t lost. With so many of those torrents have 0 or 1 seed, this is a serious risk I think, despite the comments elsewhere about people rotating what they seed.<p>I realize that doesn&#x27;t solve the access problem for most people as most of the users who need this research might not know how to use usenet or even be familiar with it at all, but I think the first major concern would be to secure the entire repository on a stable network. Usenet seems like a good place for that even if it doesn&#x27;t serves as a means of distribution. Encrypting the uploads would make them immune to DMCA takedowns provided that the decryption keys weren&#x27;t made public and were only shared with individuals related to the maintenance of the LibGen project.
评论 #21695264 未加载
lukebuehler超过 5 年前
To me, an aspiring scholar, LibGen is the most amazing tool ever. Things like inter-library loan and access to databases on university networks already make life so much easier to what it used to be—but nothing beats LibGen in terms of convenience. I’m in a the nowadays obscure field of patristic theology and I can’t believe how much stuff I can find on LibGen, often things that even highly specialized research libraries like Harvard don’t have.<p>The hours that LibGen saved me in gathering all the sources for my research must be in the hundreds. Thank you!
dooglius超过 5 年前
There is a huge amount of duplication there (i.e. books that have many scans), I wonder if it would be better to tackle that versus doing a straight backup.
评论 #21693069 未加载
评论 #21693033 未加载
评论 #21693240 未加载
burtonator超过 5 年前
What&#x27;s interesting is that 32TB is becoming more and more affordable and the research material is roughly staying about the same size.<p>That might change though as people start including video + data within papers and have new notebook formats that are live and contain docker containers&#x2F;ipython, etc.<p>It&#x27;s a shame we can&#x27;t just mail these around.
评论 #21695188 未加载
评论 #21697396 未加载
评论 #21696744 未加载
评论 #21694492 未加载
Tepix超过 5 年前
Related: Looking at harddisk cost per terabyte, quite often extern drives are cheaper than internal ones.<p>For example right now in Germany I can get a WD 8TB USB 3.0 drive for 135€ but the cheapest internal 8TB drive costs 169€.<p>Any idea why? It&#x27;s puzzling.
评论 #21694827 未加载
评论 #21693618 未加载
评论 #21693660 未加载
评论 #21693588 未加载
评论 #21695304 未加载
sandov超过 5 年前
Let me say this: I fucking love libgen. It actually makes my life better and I&#x27;m so thankful to the people running it.
nullifidian超过 5 年前
Posting that here only creates problems for them. The more it&#x27;s known in the west the more likely it will go down.
评论 #21695616 未加载
voldacar超过 5 年前
Is there a way to just download the whole 32TB to your own machine? I see a ton of mirrors but the content seems to be highly fragmented between them
评论 #21693210 未加载
Avamander超过 5 年前
Why not publish the site over IPFS, that would make P2P hosting much simpler?
评论 #21694090 未加载
评论 #21693351 未加载
评论 #21695500 未加载
fghtr超过 5 年前
Are there any i2p torrents? I guess anonymity might be helpful if I want to mirror&#x2F;seed this data...
评论 #21695022 未加载
buboard超过 5 年前
one of the next interplanetary or Interstellar Probe should carry a copy of the sci-hub torrent in some kind of permanent storage
评论 #21693119 未加载
评论 #21694865 未加载
评论 #21693436 未加载
评论 #21695138 未加载
评论 #21694683 未加载
FpUser超过 5 年前
I did not know about LibGen until this post. Too bad for me living in a cave. Anyways this is amazing project. Best luck to them and similar efforts.
6510超过 5 年前
Imagine this:<p>- A tiny well behaved client that starts with the OS.<p>- It downloads rare bits of the archive at 1 kb&#x2F;s obtaining 1 GB every 278 hours. It should stop around 100 MB to 5 GB.<p>- It periodically announces what chunks&#x2F;documents it has.<p>- It seeds those chunks at 1 kb&#x2F;s<p>- Chunks&#x2F;documents that have thousands of seeds already are not announced. Eventually those are pruned.<p>This escalates the situation to the point where everyone can help without it costing anything.<p>If someone is trying to obtain a 20 mb pfd it would take 5 and a half hours using a single 1 kb seed. With just 50 seeds it&#x27;s just 8 min.
milofeynman超过 5 年前
I&#x27;d like to dedicate 1TB of my FreeNAS to something like this. Would be nice to run a small container with some P2P service that contained that chunk.
skjoldr超过 5 年前
Can&#x27;t Tahoe-LAFS help with this kind of a challenge? I don&#x27;t have experience with it, but it looks stable.
burtonator超过 5 年前
I&#x27;ve thought that we could potentially build an end to end encrypted datastore within Polar and possibly add IPFS support to potentially help with this issue.<p>Here&#x27;s a blog post about our datastores for some background.<p><a href="https:&#x2F;&#x2F;getpolarized.io&#x2F;2019&#x2F;03&#x2F;22&#x2F;portable-datastores-and-platform-independence.html" rel="nofollow">https:&#x2F;&#x2F;getpolarized.io&#x2F;2019&#x2F;03&#x2F;22&#x2F;portable-datastores-and-p...</a><p>... essentially Polar is a PDF manager and knowledge repository for academics, scientists, intellectuals, etc.<p>One secondary challenge we have is allowing for sharing of research but I&#x27;d like to do it in a secure and distributed manner.<p>Some of our users are concerned about their eBooks being stored unencrypted and while for the majority of our users this will never be a problem I can see this being an issue in countries with political regimes that are hostile to open research.<p>In the US we have an issue of researchers being harassed over climate change btw. Having a way to encrypt your knowledge repository (ebooks) would help academic freedom as your employer or government couldn&#x27;t force you to give them your repository.<p>But what if we went beyond this and provided a way to ADD documents to the repository from a site like LibGen?<p>Then we&#x27;d have the ability to easily, with one click, encrypt the document (end to end) and added it to our repository.<p>If we can add support for Polar to allow colleagues to share directly, this would be a virtual mirror of LibGen.<p>Alice could add books b1, b2, b3 to their repo, they could then share with Bob, only he would be able to see b1, b2, b3, then they would generate a shared symmetric key to share the books.<p>No 3rd party (including me) would have any knowledge what&#x27;s going on.<p>I&#x27;m going to assume our users are not going to do anything nefarious or pirate any books. I&#x27;m also certain that they&#x27;re confirming to the necessary laws ...<p>The challenge though is that while we&#x27;d be able to have a mirror of LibGen and more material, it would be a probabilistic mirror - I&#x27;m sure we&#x27;d have like 60% of it but the obscure material wouldn&#x27;t be mirrored.<p>Right now our datastores support just local disk, and Firebase (which is Google Cloud basically). While we would encrypt the data end to end in Google Cloud I can totally understand why users might not like to use that platform.<p>One major issue is China where it&#x27;s blocked.<p>Something like IPFS could go a long way to solving this but it&#x27;s still very new and I haven&#x27;t hacked on it much.
mutant超过 5 年前
I&#x27;d say IPFS, but That&#x27;s a pretty big commitment from an entire community to keep alive.
boksiora超过 5 年前
its best to split on small torrents on few 1-2 GB so normal users can seed
asdernr超过 5 年前
If only some of the money made would reach the scientists lel. Most of em will give you their paper per mail if you aak them. The majority does not want them to sit behind paywalls...
mister_hn超过 5 年前
One could use FAANG data centers to host them for free, it would be really great
评论 #21693528 未加载
whydoyoucare超过 5 年前
Isn&#x27;t scanning a physical book and uploading a soft-copy, a landmine of hazards (both legal and moral)? Essentially you are encouraging (some) unlawful activity... I am not so sure I am onboard with this idea!
评论 #21696510 未加载