Bup – towards the perfect backup

287 pointsby hachiyaover 10 years ago

21 comments

mappuover 10 years ago

A shoutout for attic <a href="https://attic-backup.org/" rel="nofollow">https://attic-backup.org/</a>Attic is one of the new-generation hash-backup tools (like obnam, zbackup, Vembu Hive etc). It provides encrypted incremental-forever (unlike duplicity, duplicati, rsnapshot, rdiff-backup, Ahsay etc) with no server-side processing and a convenient CLI interface, and it does let you prune old backups.All other common tools seem to fail on one of the following points- Incremental forever (bandwidth is expensive in a lot of countries)- Untrusted remote storage (so i can hook it up to a dodgy lowendbox VPS)- Optional: No server-side processing needed (so i can hook it up to S3 or Dropbox)If your backup model is based on the old' original + diff(original, v1) + diff(v1, v2).. then you're going to have a slow time restoring. rdiff-backup gets this right by reversing the incremental chain. However, as soon as you need to consolidate incremental images, you lose the possibility of encrypting the data (since encrypt(diff()) is useless from a diff perspective).But with a hash-based backup system? All restore points take constant time to restore.Duplicity, Duplicati 1.x, and Ahsay 5 don't support incremental-forever. Ahsay 6 supports incremental-forever at the expense of requiring trust in the server (server-side decrypt to consolidate images). Duplicati 2 attempted to move to a hash-based system but they chose to use fixed block offsets rather than checksum-based offsets, so the incremental detection is inefficient after an insert point.IMO Attic gets everything right. There's patches for windows support on their github. I wrote a munin plugin for it.Disclaimer: I work in the SMB backup industry.

评论 #8621792 未加载

评论 #8621762 未加载

评论 #8622117 未加载

评论 #8622175 未加载

评论 #8622607 未加载

评论 #8623452 未加载

评论 #8622358 未加载

评论 #8621379 未加载

评论 #8621445 未加载

williamsteinover 10 years ago

I've long been a huge fan up bup, and have even contributed some code. I might be by far their single biggest user, since I host 96748 bup repositories at <a href="https://cloud.sagemath.com" rel="nofollow">https://cloud.sagemath.com</a>, where the snapshots for all user projects are made using bup (and mounted using bup-fuse).Elsewhere in this discussion people not some shortcomings of bup, namely not having its own encryption and not having the ability to delete old backups. For my applications, lack of encryption isn't an issue, since I make the backups locally on a full-disk encrypted device and transmit them for longterm storage (to another full disk encrypted device) only with ssh. The lack of being able to easily delete old backups is also not an issue since (1) I don't want to delete them (I want a complete history), and (2) the approach to deduplication and compression in bup makes it extremely efficient space wise, and it doesn't get (noticeably) slower as the number of commits gets large; this is in contrast to ZFS, where performance can degrade dramatically if you make a large number of snapshots, or other much less space efficient approaches where you have to regularly delete backups or you run out of space.In this discussion people also discuss ZFS and deduplication. With SageMathCloud, the filesystem all user projects use is a de-duplicated ZFS-on-Linux filesystem (most on an SSD), with lz4 compression and rolling snapshots (using zfssnap). This configuration works well in practice, since projects have limited quota so there's only a few hundred gigabytes of data (so far less than even 1TB), but the machines have quite a lot of RAM (50+GB) since they are configured for lots of mathematics computation, running IPython notebooks, etc.

评论 #8623441 未加载

rlpbover 10 years ago

I wrote a very similar tool before I knew about bup - ddar (<a href="https://github.com/basak/ddar" rel="nofollow">https://github.com/basak/ddar</a> - with more documentation at <a href="http://web.archive.org/web/20131209161307/http://www.synctus.com/ddar/" rel="nofollow">http://web.archive.org/web/20131209161307/http://www.synctus...</a>).Others have complained here that bup doesn't support deleting old backups. ddar doesn't have such an issue. Deleting snapshots work just fine (all other snapshots remain).I think the underlying difference is that ddar uses sqlite to keep track of the chunks, whereas bup is tied to git's pack format, which isn't really geared towards large backups. git's pack files are expected to be rewritten, which works fine for code repositories but not for terabytes of data.

评论 #8623726 未加载

femtoover 10 years ago

Is there anything out there that does continuous incremental backups to a remote location (like obnam, attic, ...) but allows "append only" access. That is, you are only allowed to add to the backup, and the network protocol inherently does not allow past history to be deleted or modified? Pruning old backups might be allowed, but only using credentials that are reserved for special use.Obnam, attic and similar use a normal read/write disk area, without any server side processing, so presumably an errant/malicious user is free to delete the entire backup?

评论 #8622007 未加载

评论 #8623290 未加载

beagle3over 10 years ago

Haven't seen this mentioned - but, since bup de-duplicates chunks (and thus may take very little space - e.g., when you backup a 40GB virtual machine, each snapshots takes little more than the actual changes inside the virtual machine), every byte of the backup is actually very important and fragile, as it may be referenced from thousands of files and of snapshots. This is of course true for all dedupping and incremental backups.However, bup goes one step farther and has builtin support for "par2" which adds error correction - in a way, it efficiently re-duplicates chunks so that whichever one (or two, or however many you decide) break, you can still recover the complete backup.

评论 #8623258 未加载

derekp7over 10 years ago

I was wondering if someone's done a side-by-side comparison of the various newer open-source backup tools? Specifically, I'm looking for performance, compression, encryption, type of deduplication (file-level vs. block-level, and dedup between generations only vs. dedup across all files). Also, the specifics of the implementation, since some of the tools don't really explain that too well, along with any unique features.The reason I ask, is I had a difficult time finding a backup tool that suited my own needs, so I wrote and open-sourced my own (<a href="http://www.snebu.com" rel="nofollow">http://www.snebu.com</a>), and now that some people are starting to use it in production I'd like to get a deeper peer review to ensure quality and feature completeness. (I actually didn't think I'd be this nervous about people using any of my code, but backups are kind of critical so it I'd like to ensure it is done as correct as possible).

评论 #8623234 未加载

uint32over 10 years ago

Like any good hacker I got tired of other solutions that didn't quite match my needs and made my own dropbox-like backup/sync using only rsync, ssh and encfs.<a href="https://github.com/avdd/rsyncsync" rel="nofollow">https://github.com/avdd/rsyncsync</a>Not polished, but it's working for me.<pre><code> - only runs on machines I control - server requirement is only rsync, ssh and coreutils - basic conflict detection - encfs --reverse to encrypt locally, store remotely - history is rsnapshot-style hard links - inspect history using sshfs - can purge old history </code></pre> shell aliases showing how I use it are in my config repositoryencfs isn't ideal but it's the only thing that does the job. Ideally I'd use something that didn't leak so much, but it doesn't exist.

评论 #8622493 未加载

xorcistover 10 years ago

I tried some backup software (of the rdiff variety, not the amanda variety) last year when I set up a small backup server for friends and family.Obnam and bup seemed to work mostly the way I wanted to but obnam was by far the most mature tool, so this is what I chose in the end.On the plus side, it provides both push and pull modes. Encryption and expiration works. The minus points are no Windows support, and some horror stories about performance. Apparently it can slow to a crawl with many files. I haven't run into that problem despite hundreds of gig in the backup set, but most are large files.On the whole it's been very stable and unobtrusive during the time I've used it, but I haven't used it in anger yet. So a careful recommendation for obnam from me.

franoleover 10 years ago

Does anyone use zpaq[1]? It has compression, deduplication, incremental backup, encryption, backup versioning (unlike bup, with the ability to delete old ones), and its written un C++. But im not sure about performance over network and how its compare with bup or rsync.[1] <a href="http://mattmahoney.net/dc/zpaq.html" rel="nofollow">http://mattmahoney.net/dc/zpaq.html</a>

mynegationover 10 years ago

Deleting old backups and the lack of encryption is what stopped me from using bup.

评论 #8621472 未加载

评论 #8621406 未加载

评论 #8622254 未加载

jlebarover 10 years ago

Adding a plug for git-annex. <a href="https://git-annex.branchable.com/" rel="nofollow">https://git-annex.branchable.com/</a>git annex is for more than just backups. In particular, it lets you store files on multiple machines and retrieve them at will. This lets you do backups to e.g. S3, but it also lets you e.g. store your mp3 collection on your NAS and then easily copy some files to your laptop before leaving on a trip. Any changes you make while you're offline can be sync'ed back up when you come back online.You can prune old files in git-annex [1], and it also supports encryption. git-annex deduplicates identical files, but unlike Attic &co, it does not have special handling of incremental changes to files; if you change a file, you have to re-upload it to the remote server.git-annex is actively developed, and I've found the developer to be really friendly and helpful.[1] You can prune the old files, but because the metadata history -- basically, the filename to hash mapping -- is stored in git, you can't prune that. In practice you'd need to have a pretty big repository with a high rate of change for this to matter.Edited for formatting.

评论 #8628918 未加载

eliover 10 years ago

Is there an easy way to have the backups encrypted at rest? That's a nice feature of Duplicity. I don't have to worry about someone hacking my backup server or borrowing my USB drive having access to my data.

评论 #8621415 未加载

keehunover 10 years ago

This seems like a fantastic tool, and I would love to try this out. And, it's free!My personal obstacle in using a tool like bup is the back-up space. I could definitely use this for on-site/external storage devices, but I also like to keep online/cloud copies. I currently use CrashPlan for that which affords me unlimited space. If CrashPlan would let me use their cloud with bup, wow, I would switch in a heartbeat. Perhaps cloud backup tools could learn some tricks from bup.

评论 #8621231 未加载

zannyover 10 years ago

If you want a fantastic graphical frontend for bup, there is kup, which is a kde app: <a href="http://kde-apps.org/content/show.php/Kup+Backup+System?content=147465" rel="nofollow">http://kde-apps.org/content/show.php/Kup+Backup+System?conte...</a>It is really easy to set up what folders to backup and where, and I use it whenever a backup is simply take all files from X, do the rolling backups at Y, and done.

rcthompsonover 10 years ago

If you're considering using it, keep in mind the limitations: <a href="https://github.com/bup/bup/blob/master/README.md#things-that-are-stupid-for-now-but-which-well-fix-later" rel="nofollow">https://github.com/bup/bup/blob/master/README.md#things-that...</a>The one most likely to be a showstopper seems to be: "bup currently has no way to prune old backups."

评论 #8621119 未加载

labianchinover 10 years ago

I've been using duply <a href="http://duply.net/" rel="nofollow">http://duply.net/</a> for a while. It is a simple frontend for duplicity <a href="http://duplicity.nongnu.org/" rel="nofollow">http://duplicity.nongnu.org/</a>. I find it very easy to setup. It also provides encrypted backups trough GPG.

konradbover 10 years ago

There's also Burp which is worth a look <a href="http://burp.grke.org/index.html" rel="nofollow">http://burp.grke.org/index.html</a>Looking at <a href="http://burp.grke.org/burp2/08results1.html" rel="nofollow">http://burp.grke.org/burp2/08results1.html</a> it seems it can outperform Bup in some situations.

fragmedeover 10 years ago

> That is a dataset which is already deduplicated via copy-on-write semantics (it was not using ZFS deduplication because you should basically never use ZFS deduplication).Can someone more experienced with ZFS say why?

评论 #8621424 未加载

评论 #8621295 未加载

0x0over 10 years ago

This looks very interesting as a replacement for rdiff-backup. Hopefully the missing parts aren't too far away (expire old backups, restore from remote).

jshbover 10 years ago

Can this new tool do incremental realtime disk image backup like Acronis True Image?

评论 #8622090 未加载

greensoapover 10 years ago

Given that old backups cannot be remove, isn't backuppc a better solution?

评论 #8621528 未加载