TechEcho

9 comments

wazooxover 13 years ago

For those interested, I've done lots of lessfs testing published on my professional blog a while ago :* first post: <a href="http://blogs.intellique.com/tech/2010/12/22#dedupe" rel="nofollow">http://blogs.intellique.com/tech/2010/12/22#dedupe</a>* detailed setup and benchmark results: <a href="http://blogs.intellique.com/tech/2011/01/03#dedupe-config" rel="nofollow">http://blogs.intellique.com/tech/2011/01/03#dedupe-config</a>After more than 9 months running lessfs, I recommend it.

chintanpover 13 years ago

A required reading from my course on Advanced Storage Systems at CMU, <a href="http://www.cs.cmu.edu/~15-610/READINGS/optional/zhu2008.pdf" rel="nofollow">http://www.cs.cmu.edu/~15-610/READINGS/optional/zhu2008.pdf</a>Really good paper which describes in detail how the deduplication works.

ak217over 13 years ago

So, from what I understand, this is great but more of a proof of concept since fuse performance kills it. As far as putting it in production, there are a few unresolved questions which I haven't seen picked apart:- Can dedup be integrated into the VFS layer, like unionfs is shooting for, or does it have be integrated with the underlying filesystem.- Is online dedup possible, and does the answer change when running SSD.- What's the best granularity (block-level? inode-level? block extent-level?) and how badly can it randomize the i/o. I imagine one would have to do a lot of real-world benchmarking to find this out.- Are there possible privacy issues (i.e. finding through i/o patterns whether someone else has a given block or file stored) and how to deal with them

res0nat0rover 13 years ago

Bup is also a pretty cool git based ddup backup utility:<a href="https://github.com/apenwarr/bup#readme" rel="nofollow">https://github.com/apenwarr/bup#readme</a>

viraptorover 13 years ago

I was wondering - with the current amount of abstraction and similar (sometimes redundant) metadata on almost everything - what percent of duplicate blocks could be found on a standard desktop system?I don't think it would be useful, I'm just interested in the level of "standard" data duplication.

评论 #2932809 未加载

评论 #2934960 未加载

makmanalpover 13 years ago

btrfs also has a deduplication feature in the works: <a href="http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07720.html" rel="nofollow">http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg0...</a>

tobias3over 13 years ago

I tested it and I don't recommend it. (It was like a year ago though) It was really slow and some blog posts about the reliability of the data storage backend were a little bit scary.I would recommend using zfs-fuse. You don't have the FUSE->File on a filesystem->Hard disk indirection (thus more speed). And additionaly you get all the cool ZFS features! If you need even more speed there is a ZFS kernel module for linux and a dedup patch for btrfs. I don't think those are production ready though.

评论 #2932582 未加载

评论 #2933811 未加载

aleccoover 13 years ago

I don't understand the complication of using a database. The sensible approach would be something like BMDiff with [page] indexing on top for random access.

评论 #2933494 未加载

wcoenenover 13 years ago

lessfs appears to do block level deduplication (like ZFS). This means that if I copy a huge file but add a few bytes at the start, I won't get any benefit from deduplication because the data doesn't align anymore with the original block boundaries.I wonder if there is a way to improve on that?

9 comments

wazooxover 13 years ago

chintanpover 13 years ago

ak217over 13 years ago

res0nat0rover 13 years ago

Bup is also a pretty cool git based ddup backup utility:<a href="https://github.com/apenwarr/bup#readme" rel="nofollow">https://github.com/apenwarr/bup#readme</a>

viraptorover 13 years ago

评论 #2932809 未加载

评论 #2934960 未加载

makmanalpover 13 years ago

tobias3over 13 years ago

评论 #2932582 未加载

评论 #2933811 未加载

aleccoover 13 years ago

I don't understand the complication of using a database. The sensible approach would be something like BMDiff with [page] indexing on top for random access.

评论 #2933494 未加载

wcoenenover 13 years ago

Data Deduplication with Linux

9 comments

Data Deduplication with Linux

9 comments