TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to distribute a lot of images throughout multiple Universities

16 点作者 adezxc超过 2 年前
Basically, I&#x27;ve gotten coursework at my university to consider and start using a distributed file system for storing large amounts of crystal diffraction images. It would need to have multiple copies of the files distributed in case one of the servers goes down and be scalable as it will be always increasing. I&#x27;ve looked into things like LOCKSS[1] and IPFS[2] but LOCKSS seems to be limiting itself to storing articles and IPFS doesn&#x27;t provide the data reliability in case one of the nodes goes down. Did anyone ever encounter a similar task and what did you use for that?<p>[1] https:&#x2F;&#x2F;www.lockss.org&#x2F; [2] https:&#x2F;&#x2F;ipfs.tech&#x2F;

11 条评论

zcw100超过 2 年前
IPFS does provide data reliability with the use of pinning services, a private cluster, or cooperative cluster. It seems to be difficult how to communicate how IPFS works in this regard and there are a lot of misunderstandings about it. There are some people who want IPFS to be an infinite free hard drive in the sky with automatic replication and persistence till the end of time. (it is not). Then there are the people who worry that, &quot;OMG someone can just put evil content onto my machine and I have to provide it!&quot; (it does not)<p>IPFS makes it very easy to replicate content, but you don&#x27;t have to replicate anything you don&#x27;t want to. Resources cost money so you either have to ask someone to do it for free, and you get what you get as far as reliability, or you pay someone and you get better reliability so long as you keep paying.
jjgreen超过 2 年前
This is private data right? Maybe a private bittorrent tracker with a few nodes which &quot;grab everything&quot; to ensure persistence. Never done it myself, but might be a direction worth researching ...
brudgers超过 2 年前
How much data do you have now?<p>How fast is it increasing?<p>What is your budget for hardware?<p>What is your budget for software?<p>What is your budget for development labor?<p>What is your budget for maintenance?<p>I mean the simplest thing that might work is talking to your university IT department...<p>...or calling AWS sales or another commercial organization specializing in these things.<p>The second most complicated thing you can do is to roll your own.<p>The most complicated thing you can do is to have someone else do it.<p>Good luck.
hannibal529超过 2 年前
This is a simple task with NATS JetStream object storage <a href="https:&#x2F;&#x2F;docs.nats.io&#x2F;nats-concepts&#x2F;jetstream&#x2F;obj_store&#x2F;obj_walkthrough" rel="nofollow">https:&#x2F;&#x2F;docs.nats.io&#x2F;nats-concepts&#x2F;jetstream&#x2F;obj_store&#x2F;obj_w...</a>. Just provision a JetStream cluster and an object store bucket. If you want to span the cluster over multiple clouds with a supercluster, that’s an option as well.
DrStartup超过 2 年前
Sounds like you’d want to setup a private multi org cloud storage system.<p>Something like this <a href="https:&#x2F;&#x2F;min.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;min.io&#x2F;</a> or similar. There are a dozen or so open source &#x2F; commercial s3-like object storage systems out there.<p>I have a friend that does this kind of mission critical infrastructure for research universities.<p>Dm if you’d like
mikewarot超过 2 年前
If you&#x27;re replicating one primary file system to many secondary systems, MARS might be helpful[1]. It was developed by 1&amp;1, who hosts my personal website, along with petabytes of other people&#x27;s stuff.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;schoebel&#x2F;mars">https:&#x2F;&#x2F;github.com&#x2F;schoebel&#x2F;mars</a>
rom16384超过 2 年前
I was thinking about SyncThing, <a href="https:&#x2F;&#x2F;github.com&#x2F;syncthing&#x2F;syncthing">https:&#x2F;&#x2F;github.com&#x2F;syncthing&#x2F;syncthing</a> but it&#x27;s a file synchronization tool, meaning every node would have a full copy, and it would propagate deletes from one node to another.
Quequau超过 2 年前
Isn&#x27;t rsync designed for use cases like this?
Gigachad超过 2 年前
How much data? Why not chuck it on S3 or Dropbox?
评论 #34738169 未加载
toomuchtodo超过 2 年前
BitTorrent or ceph?
hooverd超过 2 年前
Try asking your PI?