Something is twitching in the back of my mind about this. Sure, they can't look at the data based solely on the encrypted copy, but if they have a plaintext copy of a document of interest, they are able to determine which of their customers has that document, right?<p>Doesn't that diminish some of the privacy claims?
TL;DR: AES_key = SHA-256(file)<p>This does introduce new avenues for attacks, however. You don't have to be able to decrypt to show that certain people have certain files.<p>Also, for files that contain just one piece of sensitive information and a the rest is predictable (i.e, the secret key file for a website back-end), you've effectively given up a hash of the secret which can then be brute-forced.
This thread has a lot of discussion related to "convergent encryption."<p><a href="http://news.ycombinator.com/item?id=2570538" rel="nofollow">http://news.ycombinator.com/item?id=2570538</a><p>EDIT: <a href="http://news.ycombinator.com/item?id=2461713" rel="nofollow">http://news.ycombinator.com/item?id=2461713</a> as well<p>EDIT2: Actually, there's more to this problem than just convergent encryption. If the storage provider knows which encrypted blobs belong to you, it can encrypt _some_ file and still figure out which users have copies of it. So, the storage provider, which stores a collection of encrypted blobs, should not know the blob -> list(users) association. I don't know if Bitcasa addresses this part.
My biggest issue (beside the initial TC article being a complete shocker) was the claim of 60% saving on de-duplication and that each user only had 25GB of unique data.<p>This research paper from Microsoft on Farsite[2] claims 'up to 50%' saving on de-dupe with a convergent file system - but that was tested against 500 computers in a corporate environment and it was done back in 2002.<p>Users now store a lot more photos, a lot more of their own video, and any content that is DRM'd is also unique. You can save on operating system and application files, but it isn't 60%.<p>There is nothing 'finally' about this additional information. The discussion and criticism of the claims on Twitter was knowing this information about convergent encryption and the key being derived from the content. There is a lot more that is still unanswered - such as how an 'intelligent cache' allows 'unlimited' storage to be available offline.<p>I really wish these guys would release a research paper with their results, or include more information on their website before they make such bold claims in public.<p>[1] <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=69954" rel="nofollow">http://research.microsoft.com/apps/pubs/default.aspx?id=6995...</a>
it's important to note that this is not strong against knowledge of the plaintext. that's kind-of obvious, when you think about how it supports de-duplication, but perhaps an example will clarify why you might be concerned.<p>say you want to backup some data. and that data includes music or video... and the riaa or mpaa decide that bitcasa are facilitating pirating and should be shut down... so they reach a deal where all the data are checked against known songs or videos. and if they find a match then your identity will be provided for prosecution...<p>of course, if you are doing nothing wrong, you have nothing to fear. this can only identify known data. but even so, it is an interesting issue: "encryption" here doesn't have all the guarantees you might expect.<p>(there are more disturbing scenarios too. for example, perhaps a certain text is not illegal in the copyright sense, but is unacceptable politically.)<p>[disclaimer - this is from skimming the paper; i should say that i am no expert on this, so don't take my word as gospel]
<i>"HP: What do you do in terms of encryption or security?<p>TG: We encrypt everything on the client side. We use AES-256 hash, SHA-256 hashing for all the data.<p>HP: So it’s encrypted all on the client side and you can’t look at it on the server side?<p>TG: Exactly"</i><p>Finally, a company that gets it. I've been asking for this for a while now. I wish Dropbox and all the others would do this, too. I get it that some of Dropbox' customers may not want to deal with the encryption on the client side, but they should at least offer the option to everyone, and it should be right there every time someone wants to upload something. It would be best if it was the default option, too.<p>This way they won't get into the mess they got into last time with the feds asking for user data, and the clients who want full security of their data won't have to be worried about it anymore.
Academic paper on convergent encryption:<p><a href="http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf" rel="nofollow">http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf</a><p>TL;DR version: take a chunk of data, encrypt it with its own sha1 hash as the key. Now you have an encrypted version that you can dedup. You can only decrypt if you already know the hash. Info about who owns any particular chunk is not kept on the server, so even if you break in to the server, all you can tell is which chunks correspond to data you already possess. Seems plausible.
I would argue that you can either have data de-duping or encryption, but not both.<p>If encryption is defined as: Transforming data so that only people with special knowledge can read it.<p>Then if you can compare a chunk of encrypted data against another chunk to determine the source data...<p>Well now you have very weak encryption because you could brute force it if you have a large enough repository of user files.
Why is dedupe so important?<p>I have to imagine this mostly helps with OS files that are standard across man machines. Can't we ship a list of hashed client-side?
basically the argument is that this is an encryption algorithm that is deterministic as there is no randomness, after the initial value. This sounds more like a Random Oracle, <a href="http://en.wikipedia.org/wiki/Random_oracle" rel="nofollow">http://en.wikipedia.org/wiki/Random_oracle</a>. which by the way don't exist
So I can encrypt a file, upload it, and if someone else encrypts the exact same file... they can decrypt my uploaded file? I'm having a hard time wrapping my head around this.