TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How should organize and back up 23 TiB of personal files?

46 pointsby pushedxabout 1 year ago
A somewhat daunting project that I&#x27;ve been putting off for a long time is organizing and backing up 23 TiB of files spread across 40+ external and internal hard drives that I&#x27;ve collected throughout my life. There is a variety of filesystem types and interface types.<p>I take a lot of photos, so a lot of these are files that I would actually want to back up, but many of them are old operating system installs and other &quot;useless&quot; files that I don&#x27;t need archival storage for.<p>The actual size of the data that I need backed up I would estimate at around 6 TiB.<p>A few of my requirements:<p>1. I don&#x27;t need the files to be accessible online, in fact, I would prefer if they were not.<p>2. If anything is backed up to the cloud, I want pre-internet-encryption with keys that only I know and control.<p>3. I want something simple, that could be recovered using a pragmatic approach and open source software in case of a disaster.<p>4. I&#x27;d like a system where I can easily test my recovery strategy.<p>Open questions:<p>1. What local filesystem setup should I use? Number of drives? Local backup approach?<p>2. If you&#x27;ve done this before, is there a strategy that you used for the actual aggregation of the data? Are there any particularly convenient IDE to USB docks? Any good software that you would recommend for locating duplicate files?<p>3. What remote backup software should I use?<p>[ edits ]<p>Answering some questions from the comments:<p>Cost: Given a quick look at the cost of archival cloud storage, I guess I would be willing to spend up to $60 per month on a remote copy. (Noting the estimate of 6 TiB of &quot;acutal&quot; data)<p>How often: I would expect to access the remote files rarely (maybe once a month), and need a complete recovery very rarely (with a low requirement for recovery speed). Local backups I would like to occur at least weekly, with a verification or access frequently (daily?).<p>Risk tolerance: For the local-encryption-for-remote-storage aspect, I would like something with a high level of confidence in the cryptography and the implementation surrounding it. I would also like a high degree of confidence that I can recover my files in case of a natural disaster or similar that wipes out my local copies.<p>Local security: I live in a relatively secure home in a relatively low crime area. I could store a copy at a relative&#x27;s house, although I may move far enough away soon that it would not be practical to deliver or access such a backup.

45 comments

perihelionsabout 1 year ago
You should look at Borg for the remote backup software. It does automatic deduplication; has the security posture you&#x27;re asking for (&quot;untrusted server&quot;)*; and is agnostic about which backup cloud provider you use it with. Of course it&#x27;s FOSS.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Borg_(backup_software)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Borg_(backup_software)</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21642364">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21642364</a> (<i>&quot;BorgBackup: Deduplicating Archiver&quot;</i>, 103 comments)<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34152369">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34152369</a> (<i>&quot;BorgBackup: Deduplicating archiver with compression and encryption&quot;</i>, 177 comments)<p>*You <i>definitely</i> don&#x27;t want your private filenames leaked to data brokers, like Backblaze&#x27;s clients experienced.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=26536019">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=26536019</a> (<i>&quot;Backblaze submitting names and sizes of files in B2 buckets to Facebook&quot;</i>, 517 comments)
评论 #39756771 未加载
评论 #39758227 未加载
h2odragonabout 1 year ago
Organize you &quot;keepers&quot; into an archive you can maintain. at ~6TB (edited &quot;gb&quot; typo, tx) thats not a big challenge to back up; multiple usb hard drives and regular schedule backups of that will serve. &quot;Remote&quot; is a distraction; put your data on media you <i>own</i> and keep a copy in a bank box or something if you feel the need for more off site redundancy.<p>Collecting and arranging the archive is the big job. you&#x27;re just gonna have to bite down and start doing that, yourself: no one else is likely to know what needs saving or not. Set up a NAS or file share with a big HDD and start collecting files there.<p>You may find the old stuff fun to scrape up; for example how difficult is it to find a PATA interface today? <i>that problem only gets harder</i>. Motivation to get on the job now, rather than later; and to make the &quot;archive maintenance&quot; more of your everyday task list than to let it pile up.
评论 #39756377 未加载
评论 #39756330 未加载
Wingyabout 1 year ago
If you can make your dataset fit on a single 24tb disk (or less if you want cheaper disks), that simplifies the setup. I’d use a set of 3 copies:<p>* One main disk that you use for collecting and organizing all of the data<p>* One backup disk in the system with the main disk<p>* One backup disk kept off-site in a safe deposit box or similar<p>Every few months, swap the onsite and offsite backup disks to keep the offsite one fresh. When you do that, verify the integrity of the data on the disk that just came back.<p>Automate verifying the integrity of the main and onsite backup monthly.<p>My preference for filesystem would be single-disk ZFS “arrays”. That allows you to run a scrub to verify integrity easily.<p>For copying the data, use either zfs send or rsync. zfs send copies the filesystem directly so it preserves everything about the file (for example sparse files) but rsync is less complex.<p>For getting the data from your various disks, I tend to use dd to create a raw image of the disk, then use other tools like losetup to mount the image as a loop device. Then if there’s data that I want more convenient access to, I’ll rsync it out of the mounted FS. That way, you’ll never lose data from a misconfigured&#x2F;faulty copy since you can always do it again from the raw disk image.
spacebanana7about 1 year ago
I&#x27;d recommend storing it AWS S3 with an encrypted key physically stamped onto metal cards (some crypto people use similar products for seed phrases).<p>Distribute the metal cards as widely as you can, at home, potentially in safety deposit boxes or buried somewhere (potentially with an additional layer of protection from a brain memory password.)<p>This protects against a class of physical data destruction from house fires, theft, floods etc.<p>However I&#x27;d withdraw this recommendation if you live in a jurisdiction where you can be compelled to hand over passwords, e.g England
评论 #39758011 未加载
wjdpabout 1 year ago
A NAS built from generic x86 hardware and some disks. Use ZFS, it&#x27;s a bit of a rabbit hole but an excellent choice for both reliability&#x2F;redundancy and as a tool to backup. I&#x27;m not gonna explain how this works, just describe what you can do, it&#x27;s an option.<p>ZFS:<p>- ensures you don&#x27;t get bit rot<p>- manages both disks (raid&#x2F;mirrors &amp;c) and the filesystem, it&#x27;s an all-in-one solution<p>- supports block level replication to local and remote systems, after the first backup it&#x27;s fast<p>- can create dynamic partitions to group files together and build replication strategies around<p>Choose either RAID or mirrored drives (<a href="https:&#x2F;&#x2F;jrs-s.net&#x2F;2015&#x2F;02&#x2F;06&#x2F;zfs-you-should-use-mirror-vdevs-not-raidz&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jrs-s.net&#x2F;2015&#x2F;02&#x2F;06&#x2F;zfs-you-should-use-mirror-vdevs...</a>) I&#x27;ve gone mirrored but more for flexibility and performance. Use a calculator to see what options of disks you have <a href="https:&#x2F;&#x2F;jro.io&#x2F;capacity&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jro.io&#x2F;capacity&#x2F;</a> (and google &#x27;ZFS calculator&#x27; for others)<p>For backup get a second machine somewhere else in your house with a smaller setup and use ZFS replication to keep it up to date with everything on the main box you need backed up. Currently I use a raspberry pi with a USB disk but this is perhaps cutting it fine. You wanna keep this online so ZFS can periodically check the health of the data on the disks. Fully offline backups can be a risk.<p>Finally for a 3rd backup use some of those external drives, format to ZFS and use replication. Plug them in on a schedule and take a backup.<p>If you want to backup to remote systems (cloud&#x2F;a box in your parents house) it also supports filesystem encryption. With the right options you can stream incremental backups over SSH only passing encrypted blocks. The system at the other end never needs to see the raw data.
评论 #39756624 未加载
bhaneyabout 1 year ago
Do you have any nerd friends?<p>I just &quot;trade&quot; NAS space with a few of my friends for remote backups. They allocate me a few TBs on their servers and I allocate them a few TBs on mine, and we all give each other some locked down shells for remote access (bubblewrap, jail, VM, etc. everyone has a different approach).<p>I&#x27;d say grab a few 8-16TB disks, throw them in a raid 5&#x2F;6 with a filesystem that supports snapshots and compression, dump all of your 23TB of data into it, come to a trade agreement with a couple of friends, and sync your 6TB of important data to your friends&#x27; servers (gifting them a drive &gt;= to your storage needs can grease the wheels here).<p>The details don&#x27;t really matter as much as the overall architecture. I personally have something like 12 drives haphazardly shoved into an old desktop case with a couple pcie sata expansion cards for extra ports, all dm-crypt&#x27;d and gathered into a btrfs raid6. I regularly rsync my other systems to the backup server and take a snapshot after each sync, so I have incremental historical backups for every live system. Backups of retired systems are just static and never get touched. The really important stuff gets synced to friends&#x27; servers using restic (encrypted, incremental).<p>You can make the whole setup very flexible and reliable at a very low monthly cost if you just work with what you have and get your hands a little dirty instead of relying on commercial services.
ajbabout 1 year ago
Backblaze is $6&#x2F;TB so should be cost effective for this amount of data.<p>I think all the recommended backup SW (restic, duplicity) encrypt before storing. I use restic but haven&#x27;t exactly been hugely exercising it, so can&#x27;t place much weight on my experience . But it should be okay with this quantity of data.<p>Generally I&#x27;d guess that diversity of backups wins over using a single expensive one in terms of reliability, but at the expense of more overhead managing them.
评论 #39756649 未加载
deweyabout 1 year ago
I’d keep it as simple as possible so also people who are not you can access it if needed if something happens.<p>My current combination is just Backblaze + TimeMachine for local backup. I also mirror it to a Synology with CarbonCopyCloner.<p>All the encrypted, cli tools I have used in the past I abandoned again as it was always annoying to maintain and monitor and impossible to explain to anyone.
评论 #39758237 未加载
评论 #39756524 未加载
sgjohnsonabout 1 year ago
Tape drives are a viable option. 6TB LTO-7M8 cartidge can be has for $50.<p>The drive is expensive, at $4k, but it’s still the perfect solution for long-term archival, because the cartridges weigh and cost damn near nothing.<p>Yes, there’s large upfront cost, but after that the cost per MB will slowly approach the rock bottom price of how much the cartridge costs.
评论 #39759015 未加载
Cheer2171about 1 year ago
This is all contingent on four things you haven&#x27;t told us: cost, how often you expect to access it, your threat models and risk tolerance for each of those threats, and what infrastructure you have access to (aka how safe is your home, can you store a backup safely with your parents, can you store drives in a lockable private office at work safely?)<p>For example, if your house burns down, it doesn&#x27;t matter if you have 10 mirrored copies of the same 8 TiB drive in the same room. If your parents get a new housekeeper who goes on a cleaning spree, it doesn&#x27;t matter if you have 10 mirrored copies of the same 8 TiB drive in the same shoebox.
评论 #39756636 未加载
M95Dabout 1 year ago
The main problem is that you don&#x27;t have the time and energy to go through 23TB of files. You want a simple solution to pack them all in a safe place and sort them later (which will never happen, but it&#x27;s ok, we all have this problem).<p>Here&#x27;s what to do:<p>1) Get a few very large HDDs, the largest you deem afordable. In total, you should have about 4x - 8x the capacity of your current files.<p>2) Split them in two groups: main storage and backup storage.<p>3) On each group, create a btrfs RAID1 and mount it with compression. Start with a btrfs raid1c3 if you can afford 3 drives, and downgrade to raid1 if you run out of space.<p>4) Copy all your files there. You may want to make a directory for each media you copy data from. You should extract all archives that you may already have created and let btrfs deal with compression - this allows for deduplication.<p>5) Run a deduplication tool that supports btrfs (<a href="https:&#x2F;&#x2F;github.com&#x2F;Zygo&#x2F;bees">https:&#x2F;&#x2F;github.com&#x2F;Zygo&#x2F;bees</a>). Creating btrfs with sha256 checksums is probably better for block deduplication while using crypto acceleration available on most systems.<p>Safety rules:<p>- Only connect the backup drives when you do the backup. Keep the backup in the closet or, preferably, in another building (in case of fire, etc.).<p>- Label the drives. You don&#x27;t want to accidentally mix the two groups.<p>Later, when you run out of space:<p>- buy a pair of even bigger drives and add them to the btrfs volume<p>- remove the old drives from the volume (this step is very time consuming)<p>or<p>- have a huge case or hdd rack and keep adding drives without removing any<p>One last thing: you may want to use a NAS or DAS, at least for the backup group.
RGammaabout 1 year ago
Heh, I&#x27;m in a similar, though somewhat milder situation. Always one hard drive failure away from losing some (albeit non-essential) files.<p>As a first low-budget bandaid I got a single 16TB enterprise HDD (Toshiba MG08ACA16TE) in an external dock that is plugged in to a Prodesk 400 G6.<p>I regularly snapshot and sync all my btrfs drives with btrbk (all my linux devices and several of my external drives are btrfs). The Toshiba is being deduped with bees to not make this too inefficient.<p>For important files I have syncthing folders that continuously synchronize between my devices, as well as above solution.<p>For some dead media I use blurays. Still looking for a solution for Windows, though there I at least have OneDrive.<p>It&#x27;s a mess to be honest.<p>Eventually I want to have big enough RAID for file centralization, restic cloud backups, network shares and so on.<p>A crucial piece I&#x27;m still missing is a synchronization, hosting and backup classification&#x2F;policy for the various file types I have... This may well be the more difficult thing compared to just getting any redundancy.
评论 #39756514 未加载
M95Dabout 1 year ago
Dock recommendation: <a href="https:&#x2F;&#x2F;www.aliexpress.com&#x2F;item&#x2F;33045573142.html" rel="nofollow">https:&#x2F;&#x2F;www.aliexpress.com&#x2F;item&#x2F;33045573142.html</a><p>Make sure you buy the USB3 variant. There are lots of USB2 that look exactly the same. Some sellers may scam you and send the wrong one.<p>It is power hungry even when not in use, HDD slots are not hot-swap (you need to turn it off) and card reader is crap, but it is very convenient to swap HDDs quickly and it can read 1 sata and 1 pata at the same time.<p>There&#x27;s an alternative from &quot;Orico&quot; that I can&#x27;t find atm, which supports UAS, but no PATA.<p>There are lots of usb to sata + pata 40pin + pata 44pin adapter cables. I can&#x27;t say how good those are, but desktop drives would need 12V power which USB doesn&#x27;t provide and I would hate to have another power adapter and cable on my desk.
mlfreemanabout 1 year ago
You didn&#x27;t mention a budget (monthly or tolerance for one-time expenses).<p>I just copy everything to HDDs using a Plugable USB-C&#x2F;SATA dock (nicer and far more reliable IMO than those $9 dongles you see around). I then put the drives in a Turtle HDD case for padding against environmental factors. That protects me against everything except house fire&#x2F;theft&#x2F;tornado sucking up the case and dropping it in another state.<p>My backup needs are beyond any single drive, but at 23 TiB you don&#x27;t have to purge much to fit on a single hard drive...there are 20 and 22 TB models for sale.<p>I&#x27;d buy a drive <i>WAY</i> larger than 6 TiB just in case you underestimated how much you actually want to save. Having the extra space would also allow you to incorporate error-correction techniques like generating PAR2 files (I did that with some emotionally-important personal files).
评论 #39756593 未加载
conqrrabout 1 year ago
The simplest solution is the best when it comes to backups (and almost everything). Since you don&#x27;t need always online, we can eliminate cloud and NAS. Use a 8Tb portable Drives as your backup is enough. With more elimination and compression, you can likely get your storage needs down further. You can use another one for offsite backup. Keep it at a different location like your office or a friends. Use some software like Restic or Borg to encrypt and maintain snapshots of your backup. You can find systemd scripts that automate all of this. Lastly, Periodically test your backup works as expected.<p>This is what I do basically, but additionally I maintain the current year of photos on Google photos too and use Drive for personal document storage so that its like a live 24x7 cache of recent data. This has worked for ~10 years.
geor9eabout 1 year ago
Be lazy like me. $400 will get you 2x16GB drives and a 2-bay Synology NAS. Tell it to mirror the drives to each other and keep history. The defaults are fine. Then mirror the most important files with it&#x27;s cloud sync function (GDrive, etc). This is in case your house burns down. I know you dislike &quot;online&quot; but the fact is cloud datacenters have pretty good redundancy. Plug external drives into the USB ports for unimportant and replaceable files. Meditate with idea that you don&#x27;t actually need to organize anything since the NAS will index it all so you can search. Call it a day.
helsinkiandrewabout 1 year ago
So you don&#x27;t create another 40+ disks in the future, get some kind of network attached storage (or an always on computer) and use it as a backup&#x2F;media store for your backing up your everyday computers and a media store. Hopefully it will outlast several computers (I have had the same Drobo with 2 or 3 disks in RAID since 2014, had 1 disk fail).<p>Back it up with something one of the other commenters suggest - I actually use rdist to space on a friends computer (and vice versa) that&#x27;s worked unattended for over 15 years (and free!)
mavhcabout 1 year ago
If you care about your data use ZFS, 3 18TB drives Raid Z1 should do, any random computer with 3 drive bays<p>You can set up zpool scrub every week and email yourself the results to check the files are surviving
sirpenguinabout 1 year ago
I use borg backup, one copy goes to a local backup disk (the cheapest brand-name 4tb USB external drive I could find) for quick recoveries, another goes to amazon glacier for catastrophic situations. Cost is something like $3 a month for just shy of 4TB of data, photos and videos that I&#x27;ve taken myself being a large part of that.<p>I would love to know if anyone considers printed albums, whether of photos, textual data or some other elaborate system involving QR codes or what have you, to be part of their strategy.
TMWNNabout 1 year ago
For local storage, an alternative to the Synology that others have mentioned is UnRAID.<p>* Consolidate your drives to empty the largest ones you can. Or, buy three 20TB drives (two for data, one for parity).<p>* Buy&#x2F;build a small server, install UnRAID on it, and create an array. If using new drives, you will have 40TB available.<p>* Copy each old drive&#x27;s data to the array. Retain the old drives as backup.<p>* Once done, a) set up remote backup, and b) build a second, identical UnRAID server as another backup. Or the tape backup also suggested.
xienzeabout 1 year ago
Why isn&#x27;t anyone suggesting S3 glacier deep archive? Looks like about $6 per month for 6TB. Sure it&#x27;ll cost $540 to pull it out, but that should be a one time thing...
评论 #39761425 未加载
评论 #39764935 未加载
HPsquaredabout 1 year ago
WinDirStat (or Linux equivalent) and delete the stuff you don&#x27;t need. You can run it across multiple drives as long as they&#x27;re all (accessible) on the same machine.
评论 #39765313 未加载
0xCMPabout 1 year ago
I would look into Git Annex which I have been heavily using to do the same thing with almost the same exact number of files.<p>Its very simple to have a &quot;dumb&quot; drive that simply stores the files and use annex to remember which drive it is. Also, to track and remember that you only have it in one place in case you want more copies. It can also push files to s3, glacier, and some other backup repos if needed with client side encryption (code between either symmetric or gpg)
JZL003about 1 year ago
I&#x27;ve never done that many but rclone is famous (imo reliable) and very cross platform. It also supports an encryption layer it controls over top of generic cloud providers<p>So I&#x27;ve been enjoying b2 (as an example, you can Fuse mount to browse the files without downloading). But just for backup don&#x27;t Amazon glacier or Google cloud archive is so so cheap. If you wanted to be paranoid you could do both separately<p>I haven&#x27;t independently audited rclone&#x27;s encryption layer
prirunabout 1 year ago
Google archive storage is $1.20&#x2F;TB&#x2F;mo. I just had a customer recommend it to me (I&#x27;m the author of HashBackup). He said he&#x27;s been using it for over a year and pays around $3&#x2F;mo for 2.5TB. One gotcha: the minimum storage duration is 365 days, so if you upload a 1TB then immediately delete it, you&#x27;re going to still pay $14 over the next year. I really dislike &quot;delete penalty&quot; fees, but they&#x27;re common.
skeritabout 1 year ago
I use <i>borgbase</i> for my cloud backups. Works great, but not for 23 TiB of data<p>I used to use external hard drives for backing up large amounts of data, but nearly all of them failed (4 out of 5 of my external hard drives broke. I can&#x27;t get data of them because there are multiple I&#x2F;O errors, even though RAID says the drive is fine. Other devices even fail to show up at all)<p>So I actually decided to get a Synology NAS and use it exclusively as a backup target.
28304283409234about 1 year ago
I decided that nothing is important enough to me to spend any time on this. Other than financial, job, medical, insurance or tax records and the like.
alt227about 1 year ago
You didn&#x27;t note any budget requirements.<p>If money is no object, just buy a synology NAS drive and be done with it. They do everything you want and more, and are incredibly user friendly.<p>Use their BTRFS filesystem with SHR Disk groups to get multiple disk redundancy alongside data scrubbing for bitrot protection.<p>It also contains software to connect to any cloud provider for remote backups if that is what you want.<p>EDIT: Something like a synology DS1621+ would do you well.
wjdpabout 1 year ago
For duplicate files I&#x27;ve used the following: - <a href="https:&#x2F;&#x2F;github.com&#x2F;adrianlopezroche&#x2F;fdupes">https:&#x2F;&#x2F;github.com&#x2F;adrianlopezroche&#x2F;fdupes</a> - <a href="https:&#x2F;&#x2F;github.com&#x2F;pkolaczk&#x2F;fclones">https:&#x2F;&#x2F;github.com&#x2F;pkolaczk&#x2F;fclones</a><p>The latter works well for larger datasets, outputs a TXT which you can analyse and decide what to do with.
mvkelabout 1 year ago
The variety of approaches in these comments is fascinating.<p>Does it mean there&#x27;s more than one right answer, or that no one solution is ideal?
评论 #39756600 未加载
评论 #39764920 未加载
arcbyteabout 1 year ago
4 bay qnap with dual 14tb disks in raid 0. From there you can encrypt locally and hybrid cloud sync to s3 glacier.
评论 #39756626 未加载
sumoboyabout 1 year ago
Buy an external usb drive, 1 or 2 bays with a 16 to 22tb drives and consolidate everything. Use backblaze for unlimited storage. If really only have around 6tb of actual data, a 2 bay external drive mirrored will be sufficient.
评论 #39757155 未加载
karencaritsabout 1 year ago
The datahoarding community often discuss things like this: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;DataHoarder&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;DataHoarder&#x2F;</a>
johnchristopherabout 1 year ago
I think you would also benefit from asking on reddit r&#x2F;datahoarders.
karlsheaabout 1 year ago
If you can pare things down to fit on a couple of drives that you can keep attached to an online computer, Backblaze would be inexpensive and they have an option to use your own encryption key.
D13Fdabout 1 year ago
I use Arq and Wasabi for about 25TB of backups. Wasabi is $173&#x2F;month for the 25TB, though.
评论 #39764926 未加载
aborsyabout 1 year ago
I would get a synology NAS.<p>For offsite, I would use restic to S3 or backblaze.
bugbuddyabout 1 year ago
Why not turn this into a nice tech portfolio demo project? Why don’t you design a global high availability data system on top of Amazon S3? Then, you could also implement native client for each OS that you use. It is a great way to learn.
mft_about 1 year ago
Maybe before you get into the technical aspects, there&#x27;s another consideration. Out of that 23TiB, are you able to estimate how much of a problem it would be, if you lost some or all of it? e.g.<p>* disastrous * very upsetting * disappointing * meh<p>(It might also help to consider when you last needed to access any of it?)<p>Because honestly, my bet (without judgement) is that there&#x27;s likely a significant amount of data in there that simply doesn&#x27;t warrant keeping. I base this on my own habits (I have to actively fight a hoarding tendency digitally and in real life) and also knoweldge of friends who (while otherwise very well adjusted) seem to find digital hoarding easy to fall into - maybe because it has less of a visible life impact than physical belongings.
vermadenabout 1 year ago
23 TiB ... I would use 4 disks for that - each 10 TB or 12 TB in size (depending what room You want).<p>In RAID5(3) + SPARE with ZFS that would be &#x27;raidz&#x27; mode.<p><pre><code> % math 12000000000000000 &#x2F; 1024 &#x2F; 1024 &#x2F; 1024 &#x2F; 1024 10913.93 % math 10000000000000000 &#x2F; 1024 &#x2F; 1024 &#x2F; 1024 &#x2F; 1024 9094.94 </code></pre> From 10 TB disks You would have 3 X 9 TiB which means 27 TiB of space available.<p>From 12 TB disks You would have 3 X 10.5 TiB which means 31.5 TiB of space available.<p><i>&gt; 1. I don&#x27;t need the files to be accessible online, in fact, I would prefer if they were not.</i><p>I would keep it on a local LAN w&#x2F;o Internet access.<p><i>&gt; 2. If anything is backed up to the cloud, I want pre-internet-encryption with keys that only I know and control.</i><p>Use rclone(1) with its encryption - You can clone these files to S3 in the cloud.<p><i>&gt; 3. I want something simple, that could be recovered using a pragmatic approach and open source software in case of a disaster.</i><p>I use rsync(1) for forever-incremental backups and rclone(1) to backup some of that into encrypted S3.<p>My rsync(1) scripts are here (maybe You will find them useful):<p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete-before.sh">https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete...</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete-checksum.sh">https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete...</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete-nocompress.sh">https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete...</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete-permissions.sh">https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete...</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete.sh">https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync-delete...</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync.sh">https:&#x2F;&#x2F;github.com&#x2F;vermaden&#x2F;scripts&#x2F;blob&#x2F;master&#x2F;rsync.sh</a><p><i>&gt; 4. I&#x27;d like a system where I can easily test my recovery strategy.</i><p>Also with rsync(1) ... or anything else for plain dirs&#x2F;files.<p><i>&gt; 1. What local filesystem setup should I use? Number of drives? Local backup approach?</i><p>ZFS.<p>Details about ZFS pool settings here:<p>- <a href="https:&#x2F;&#x2F;vermaden.wordpress.com&#x2F;2023&#x2F;04&#x2F;10&#x2F;silent-fanless-dell-wyse-3030-lt-freebsd-server&#x2F;" rel="nofollow">https:&#x2F;&#x2F;vermaden.wordpress.com&#x2F;2023&#x2F;04&#x2F;10&#x2F;silent-fanless-del...</a><p><i>&gt; 2. If you&#x27;ve done this before, is there a strategy that you used for the actual aggregation of the data?</i><p>I am doing something similar for 5 TB data set. I have 4 data sets. I use 2.5 5 TB drives with ZFS and GELI encrypted.<p><pre><code> Main Source ==&gt; Backup @ LAN (LOCAL) \ \ \ &gt; Backup @ Internet (SSH) (REMOTE) \ \&gt; Backup @ USB (OFFLINE) </code></pre> <i>&gt; Are there any particularly convenient IDE to USB docks?</i><p>Many - check ALIEXPRESS.COM for tons of them.<p><i>&gt; Any good software that you would recommend for locating duplicate files?</i><p><pre><code> % cargo install czkawka_gui czkawka_cli </code></pre> You may also use ZFS deduplication for the datasets you KNOW have duplicated data. There is also new ZFS feature Block Cloning - You may want to look into that as well.<p><i>&gt; 3. What remote backup software should I use?</i><p>I use rsync(1) and rclone(1). The rsync(1) for everything file&#x2F;dir based. The rclone(1) to put encrypted backups into S3 containers.<p>Regards, vermaden
tombertabout 1 year ago
Not exactly what you&#x27;re asking for, but I think worth considering: LTO-6 data tapes.<p>I have about 29TB of blu-ray rips that I didn&#x27;t want to risk having to re-rip (that took months!). My solution was to buy an LTO-6 tape drive on eBay, and about 100 tapes.<p>If you get lucky, a used LTO-6 tape drive will cost you roughly $250-$350 on ebay. The tapes themselves can be had for about $10 each, particularly if you buy a lot at once. Each tape can hold around 2TB [1]. I have all my movies backed up twice, on two tapes each. I have a label maker where I label the tapes from A-Z and I have a spreadsheet keeping track of which movies live on which tape, in case I need to restore just one.<p>I don&#x27;t know if there are any kind of proprietary blobs in the kernel required for this, but I was able to get this working on vanilla NixOS with the `sg` kernel module enabled, and the open source LTFS implementation from HP [2].<p>The tapes are actually a lot faster to read and write than people think, but you can only read and write one file at a time, so you have to plan accordingly. They&#x27;re also not random-access, so even though LTFS gives you a filesystem mountpoint, you probably don&#x27;t want to be rsyncing files directly to them. It&#x27;s not a &quot;RAID&quot;, just a regular filesystem so when I run out of tapes, I can simply buy some more.<p>I keep them in a big plastic storage bin, and I have a ton of desiccant in there to protect against humidity. I haven&#x27;t lost any tapes yet, and they&#x27;re rated for like 15-30 years, but I want to hedge my bets a bit and desiccant is not expensive or hard to get.<p>Still, I am very happy with my setup. It&#x27;s saved me a lot of time after I broke a RAID configuration and lost all my blu-ray rips for my Jellyfin server.<p>[1] They advertise like 6.5TB but that&#x27;s sort of a lie; that&#x27;s assuming the best-case scenario with their on-board compression. If you&#x27;re backing up already-compressed stuff like video or photos, you get much closer to the 2.5TB limit, and you don&#x27;t really want to run them to the edge I think, so I stop after 2TB.<p>[2] <a href="https:&#x2F;&#x2F;buy.hpe.com&#x2F;us&#x2F;en&#x2F;storage&#x2F;storage-software&#x2F;storage-device-management-software&#x2F;storeever-tape-device-management-software&#x2F;hpe-storeopen-linear-tape-file-system-ltfs-software&#x2F;p&#x2F;4249221" rel="nofollow">https:&#x2F;&#x2F;buy.hpe.com&#x2F;us&#x2F;en&#x2F;storage&#x2F;storage-software&#x2F;storage-d...</a><p>ETA:<p>In regards to &quot;testing&quot;, I didn&#x27;t do anything too elaborate. I filled up a tape with movies, then copied them back, and compared the md5sum of each of the movies to make sure nothing had changed. They hadn&#x27;t changed so I was happy enough with the results.<p>Also, I forgot to mention, most of the tape decks I&#x27;ve seen are SAS-only, so you&#x27;ll either need to make sure your computer&#x2F;server&#x2F;whatever has a SAS port, or you&#x27;ll need to find a card that has one. I think the modern LTOs have thunderbolt support, but I haven&#x27;t used them. I simply found a used PCIe SAS adapter on ebay for $35 with shipping, and plugged that into my server. I think the only things I had to directly install were `mt`, `ltfs`, and enable `sg`.
评论 #39756855 未加载
fool1471about 1 year ago
A RAID-based NAS would be the obvious way to go.<p>Since you are not bothered about huge data throughput, software-RAID (rather than hardware-RAID) would be the cheaper way to go in general. A lot of the discussion of pros&#x2F;cons of different RAID-levels that you can find online will give a lot of attention to how it affects the aggregate read&#x2F;write speed; for a single-user data-archive, this is not hugely important when compared to the basic ratio of usable to redundant disk space.<p>You can manually set up software-RAID on most linux distros for any filesystem you like, or if you want something that does most of it for you then I can recommend unRAID (<a href="https:&#x2F;&#x2F;unraid.net&#x2F;" rel="nofollow">https:&#x2F;&#x2F;unraid.net&#x2F;</a>).<p>I have an unRAID server with 8x 3TB HDDs and 2x 1TB SSDs in which the HDDs are in a parity RAID array (I can never remember which RAID-level number that is) meaning I get 18TB of usable space with two-disk of redundancy.<p>The two SSDs then act as a write-cache (in mirrored RAID) so the HDDs don&#x27;t need to be spun up when you add new data. This makes the whole thing very low power as the HDDs spend 99% of their time spun-down. I think my server uses about 42W on average, and that&#x27;s with a bunch of web services going on as well.<p>unRAID provides a lot of useful utilities for managing files, some native and some via plugins. This is things such as Discord&#x2F;email&#x2F;Telegram integrations (so your server can notify you when a disk starts to fail) as well as things like file integrity monitoring, fan control, scheduled backup, etc.<p>LUKS encryption is supported if you want extra security.<p>Re point 3: if your unRAID OS keels over for whatever reason then the data on the drives is stored in the filesystem of your choice so you are not bound to using unRAID to recover that data.<p>Re point 4: you can test your system by pulling drives - unRAID should automatically emulate the data on the missing drive while you find a replacement. I have had multiple drives fail (due to a faulty HBA) and have not lost any data at all.<p>Re point 2: an unRAID server is accessible on your local network, and you can choose to enable SAMBA and&#x2F;or NFS for different &quot;shares.&quot; e.g. you could have your music share accessible read-only by everyone but write-protected to just you and simultaneously have your personal files share only accessible to one user.<p>What filesystem to use is a whole can of worms. I use XFS and it is thoroughly okay - I&#x27;m not enough of a power user for the choice of file system to make a difference to daily life and I suspect this is the case for you too.<p>If you want more redundancy than the standard parity array offers, then you can set up &quot;pools&quot; in the OS that have different RAID levels.<p>For your 6TB of data, an array of 4x 3TB HDDs would be a fine start, giving you 8TB of usable space with single-disk redundancy. An SSD cache pool can be added later to lower initial setup costs. With just four to six devices, chances are your motherboard will have enough SATA ports for you to not need any kind of PCI HBA or expander cards. 3TB per disk is a good trade-off point between capacity and the cost of a failed drive, IMO.<p>You won&#x27;t need a lot of RAM - 8GB would be plenty if you don&#x27;t plan on using it for hosting any web-services.<p>For a processor, look for a good low-power option such as an Intel Xeon E3-1220L. With a canny enough choice of components, you should be able to keep your power consumption well below 30W (while the drives are spun down). If you really only need to access this data very occasionally then there is no reason not to power the server down when not in use.<p>Chassis choice is also near-infinite. I have a UNAS (<a href="https:&#x2F;&#x2F;www.u-nas.com&#x2F;xcart&#x2F;cart.php?target=category&amp;category_id=249" rel="nofollow">https:&#x2F;&#x2F;www.u-nas.com&#x2F;xcart&#x2F;cart.php?target=category&amp;categor...</a>) which is lovely, but any old PC chassis will do if you aren&#x27;t fussy.<p>A good tip with multi-drive systems in general is to deliberately unbalance your drive-use. If you use all the drives equally, you will wear them all out at the same rate, raising the chances that multiple drives will fail within a short space of time. I tend to separate my drives by use - music on one, films on others, etc. This also keeps power consumption down as you only need to spin up one drive to access one group of files (rather than songs in an album being potentially split over multiple drives).<p>In terms of strats for performing the actual backup&#x2F;organisation, I would find a way of mounting the existing drives to the new system one-by-one (such as by using an eSATA PCI card). Using fresh, blank drives for your NAS will mean that you can retain the ability to start again if you mess up or change your mind about what you want to keep as you won&#x27;t need to modify any of the data on your existing collection while you create the new one. Generally speaking, I would avoid trying to organise the data in-situ.<p>It is worth re-iterating that you can achieve very similar results with a server running Ubuntu using open-source software RAID drivers or even a cheap second-hand hardware RAID controller PCI card. I am just a big fan of how low-maintenance my unRAID setup is compared to when I used to do it all manually.<p>It is also worth mentioning that you can get a pretty good off-the-shelf solution for this sort of thing from companies like QNAP and Synology.
kagevfabout 1 year ago
rsync
NoraCodesabout 1 year ago
I highly recommend Tarsnap [1] for your off-site backups. It has a tool that does pre-encryption with user-controlled keys, deduplication, and a reasonable price; for your use case, about $1500 a month, though more in the first month for bandwidth. I&#x27;ve used them for years.<p>1: <a href="https:&#x2F;&#x2F;www.tarsnap.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.tarsnap.com&#x2F;</a>
评论 #39756400 未加载
评论 #39756455 未加载
评论 #39756419 未加载
评论 #39756597 未加载
评论 #39756429 未加载
评论 #39756489 未加载
评论 #39756779 未加载
tnbpabout 1 year ago
1. Doesn&#x27;t matter. ZFS or whatever. I use ext4, it&#x27;s good enough.<p>2. Buy as many 3.5 inch external USB drives as you need to reach 23 TiB, then connect them all over a single USB hub. Buy the same drives again, shuck them, and stuff them in this[1]. Store them in one of these[2] when you&#x27;re not doing a backup and put it somewhere outside your home. Merge them all using mergerfs.<p>3. rsync. If you need to think about how precious your data is, at 23 TiB, you&#x27;ve already lost. Just backup everything in triplicate. Don&#x27;t bother setting up RAID; it&#x27;s not a backup.<p>[1] <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;-&#x2F;en&#x2F;dp&#x2F;B07MQCDVJ2&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;-&#x2F;en&#x2F;dp&#x2F;B07MQCDVJ2&#x2F;</a> [2] <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;-&#x2F;en&#x2F;dp&#x2F;B087WXFFW6&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;-&#x2F;en&#x2F;dp&#x2F;B087WXFFW6&#x2F;</a>