TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Finally Bitcasa CEO Explains How The Encryption Works

33 pointsby eljacoover 13 years ago

15 comments

callahadover 13 years ago
Something is twitching in the back of my mind about this. Sure, they can't look at the data based solely on the encrypted copy, but if they have a plaintext copy of a document of interest, they are able to determine which of their customers has that document, right?<p>Doesn't that diminish some of the privacy claims?
评论 #3010752 未加载
评论 #3010876 未加载
评论 #3010657 未加载
maakuover 13 years ago
TL;DR: AES_key = SHA-256(file)<p>This does introduce new avenues for attacks, however. You don't have to be able to decrypt to show that certain people have certain files.<p>Also, for files that contain just one piece of sensitive information and a the rest is predictable (i.e, the secret key file for a website back-end), you've effectively given up a hash of the secret which can then be brute-forced.
评论 #3010764 未加载
xtacyover 13 years ago
This thread has a lot of discussion related to "convergent encryption."<p><a href="http://news.ycombinator.com/item?id=2570538" rel="nofollow">http://news.ycombinator.com/item?id=2570538</a><p>EDIT: <a href="http://news.ycombinator.com/item?id=2461713" rel="nofollow">http://news.ycombinator.com/item?id=2461713</a> as well<p>EDIT2: Actually, there's more to this problem than just convergent encryption. If the storage provider knows which encrypted blobs belong to you, it can encrypt _some_ file and still figure out which users have copies of it. So, the storage provider, which stores a collection of encrypted blobs, should not know the blob -&#62; list(users) association. I don't know if Bitcasa addresses this part.
nikcubover 13 years ago
My biggest issue (beside the initial TC article being a complete shocker) was the claim of 60% saving on de-duplication and that each user only had 25GB of unique data.<p>This research paper from Microsoft on Farsite[2] claims 'up to 50%' saving on de-dupe with a convergent file system - but that was tested against 500 computers in a corporate environment and it was done back in 2002.<p>Users now store a lot more photos, a lot more of their own video, and any content that is DRM'd is also unique. You can save on operating system and application files, but it isn't 60%.<p>There is nothing 'finally' about this additional information. The discussion and criticism of the claims on Twitter was knowing this information about convergent encryption and the key being derived from the content. There is a lot more that is still unanswered - such as how an 'intelligent cache' allows 'unlimited' storage to be available offline.<p>I really wish these guys would release a research paper with their results, or include more information on their website before they make such bold claims in public.<p>[1] <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=69954" rel="nofollow">http://research.microsoft.com/apps/pubs/default.aspx?id=6995...</a>
评论 #3010911 未加载
评论 #3010944 未加载
andrewcookeover 13 years ago
it's important to note that this is not strong against knowledge of the plaintext. that's kind-of obvious, when you think about how it supports de-duplication, but perhaps an example will clarify why you might be concerned.<p>say you want to backup some data. and that data includes music or video... and the riaa or mpaa decide that bitcasa are facilitating pirating and should be shut down... so they reach a deal where all the data are checked against known songs or videos. and if they find a match then your identity will be provided for prosecution...<p>of course, if you are doing nothing wrong, you have nothing to fear. this can only identify known data. but even so, it is an interesting issue: "encryption" here doesn't have all the guarantees you might expect.<p>(there are more disturbing scenarios too. for example, perhaps a certain text is not illegal in the copyright sense, but is unacceptable politically.)<p>[disclaimer - this is from skimming the paper; i should say that i am no expert on this, so don't take my word as gospel]
nextparadigmsover 13 years ago
<i>"HP: What do you do in terms of encryption or security?<p>TG: We encrypt everything on the client side. We use AES-256 hash, SHA-256 hashing for all the data.<p>HP: So it’s encrypted all on the client side and you can’t look at it on the server side?<p>TG: Exactly"</i><p>Finally, a company that gets it. I've been asking for this for a while now. I wish Dropbox and all the others would do this, too. I get it that some of Dropbox' customers may not want to deal with the encryption on the client side, but they should at least offer the option to everyone, and it should be right there every time someone wants to upload something. It would be best if it was the default option, too.<p>This way they won't get into the mess they got into last time with the feds asking for user data, and the clients who want full security of their data won't have to be worried about it anymore.
评论 #3010907 未加载
评论 #3010726 未加载
评论 #3010635 未加载
lisperover 13 years ago
Academic paper on convergent encryption:<p><a href="http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf" rel="nofollow">http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf</a><p>TL;DR version: take a chunk of data, encrypt it with its own sha1 hash as the key. Now you have an encrypted version that you can dedup. You can only decrypt if you already know the hash. Info about who owns any particular chunk is not kept on the server, so even if you break in to the server, all you can tell is which chunks correspond to data you already possess. Seems plausible.
评论 #3010683 未加载
评论 #3011057 未加载
gstover 13 years ago
Nothing new here. Same technique has been used by Wuala for years now.
评论 #3011078 未加载
mmaunderover 13 years ago
I would argue that you can either have data de-duping or encryption, but not both.<p>If encryption is defined as: Transforming data so that only people with special knowledge can read it.<p>Then if you can compare a chunk of encrypted data against another chunk to determine the source data...<p>Well now you have very weak encryption because you could brute force it if you have a large enough repository of user files.
评论 #3011321 未加载
rubyorchardover 13 years ago
Encryption provides confidentiality in a secure system. Convergent encryption doesn't fit that bill.
joshuover 13 years ago
Why is dedupe so important?<p>I have to imagine this mostly helps with OS files that are standard across man machines. Can't we ship a list of hashed client-side?
评论 #3010774 未加载
评论 #3010776 未加载
评论 #3010875 未加载
评论 #3010815 未加载
esuttonover 13 years ago
basically the argument is that this is an encryption algorithm that is deterministic as there is no randomness, after the initial value. This sounds more like a Random Oracle, <a href="http://en.wikipedia.org/wiki/Random_oracle" rel="nofollow">http://en.wikipedia.org/wiki/Random_oracle</a>. which by the way don't exist
grimtriggerover 13 years ago
So can Mark Zuckerberg sign up for Bitcasa and store all of Facebook's photos there for $10 a month?
eljacoover 13 years ago
Curious to hear if anyone has experience with this "convergent encryption."
评论 #3010644 未加载
nanerover 13 years ago
So I can encrypt a file, upload it, and if someone else encrypts the exact same file... they can decrypt my uploaded file? I'm having a hard time wrapping my head around this.
评论 #3011071 未加载