“Should you encrypt or compress first?”

463 pointsby phillmvalmost 9 years ago

33 comments

orlpalmost 9 years ago

There's no compress or encrypt _first_.It's just compress or not, before encrypting. If security is important, the answer to that is no, unless you're an expert and familiar with CRIME and related attacks.Compression after encryption is useless, as there should be NO recognizable patterns to exploit after the encryption.

评论 #11994812 未加载

评论 #11994892 未加载

评论 #11994635 未加载

评论 #11994913 未加载

评论 #11998855 未加载

评论 #11995137 未加载

评论 #11998205 未加载

评论 #11996138 未加载

评论 #11994872 未加载

vogalmost 9 years ago

A more interesting question is whether to compress or sign first.There's an interesting article on that topic by Ted Unangst:"preauthenticated decryption considered harmful"<a href="http://www.tedunangst.com/flak/post/preauthenticated-decryption-considered-harmful" rel="nofollow">http://www.tedunangst.com/flak/post/preauthenticated-decrypt...</a>EDIT: Although the article talks about encrypt+sign versus sign+encrypt, the same argument goes for compress+sign versus sign+compress. You shouldn't do anything with untrusted data before having checked the signature - neither uncompress nor decrypt nor anything else.

评论 #11994848 未加载

评论 #11997751 未加载

mjevansalmost 9 years ago

Where everyone seems to be getting confused is handling a live flow versus handling a finalized flow (a file).* Always pad to combat plain-text attacks, padding in theory shouldn't compress well so there's no point making the compression less effective by processing it.* Always compress a 'file' first to reduce entropy.* Always pad-up a live stream, maybe this data is useful in some other way, but you want interactive messages to be of similar size.* At some place in the above also include a recipient identifier; this should be counted as part of the overhead not part of the padding.* The signature should be on everything above here (recipients, pad, compressed message, extra pad).. It might be useful to include the recipients in the un-encrypted portion of the message, but there are also contexts where someone might choose otherwise; an interactive flow would assume both parties knew a key to communicate with each other on and is one such case.* The pad, message, extra-pad, and signature /must/ be encrypted. The recipients /may/ be encrypted.I did have to look up the sign / encrypt first question as I didn't have reason to think about it before. In general I've looked to experts in this field for existing solutions, such as OpenPGP (GnuPG being the main implementation). Getting this stuff right is DIFFICULT.

Animatsalmost 9 years ago

This is why military voice encryption sends at a constant bitrate even when you're not talking. For serious security applications where fixed links are used, data is transmitted at a constant rate 24/7, even if the link is mostly idle.

dietricheppalmost 9 years ago

Wow, what a trainwreck. So many comments in here talking about whether it would be possible to compress data which looks like uniformly random data, for all the tests you would throw at it. Spoiler alert, you can't compress encrypted data. This isn't a question of whether we know it's possible, rather, it's a fact that we know it's impossible.In fact, if you successfully compress data after encryption, then the only logical conclusion is that you've found a flaw in the encryption algorithm.

kinofcainalmost 9 years ago

Also interesting is which compression algorithm you're using. HPACK Header compression in HTTP 2.0 is an attempt to mitigate this problem:<a href="https://http2.github.io/http2-spec/compression.html#Security" rel="nofollow">https://http2.github.io/http2-spec/compression.html#Security</a>

js2almost 9 years ago

The paper cited in this article (Phonotactic Reconstruction of Encrypted VoIP Conversations) really deserves to be highlighted, so I submitted it separately:<a href="https://news.ycombinator.com/item?id=11995298" rel="nofollow">https://news.ycombinator.com/item?id=11995298</a><a href="http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf" rel="nofollow">http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf</a>

tompalmost 9 years ago

I don't understand... Why couldn't you do CRIME with no compression as well? Assuming you can control (parts of) the plaintext, surely plaintext+encrypt gives you more information than plaintext+compress+encrypt?

评论 #11994499 未加载

评论 #11994677 未加载

评论 #11994548 未加载

评论 #11994513 未加载

评论 #11994515 未加载

评论 #11994541 未加载

arknavealmost 9 years ago

I picked up on the reference to Stockfighter, but does anyone know if the walking machine learning game mentioned at the end of the article exists? Sounds like a fun game.

评论 #11995826 未加载

jakozauralmost 9 years ago

Would adding some tiny random size help? Based on my poorly understanding, if after compress, but before encrypt we add random 0 to 16 bytes or 1% of size that could defeat quite a lot of attacks (like CRIME).

评论 #11994873 未加载

评论 #11998324 未加载

评论 #11994879 未加载

IncRndalmost 9 years ago

Despite the question being flawed. The correct answer is a series of questions: Who is the attacker? What are you guarding? What assumptions are there about the operating environment? What invariants (regulations, compliance, etc) exist?There may be compensating controls that invalidate the perceived needs for encryption or compression, for example. i.e. don't design in the dark.Of course, the interviewer may just want a canned scripted answer - but the interview is your chance to shine, showing how you can discuss all the angles.

spatulonalmost 9 years ago

That was a fun read. Do I detect a nod to tptacek's "If You’re Typing the Letters A-E-S Into Your Code You’re Doing It Wrong"?<a href="https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2009/july/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing-it-wrong/" rel="nofollow">https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...</a>

biokodaalmost 9 years ago

If you're compressing audio, the simple solution is to compress using constant bitrate.

评论 #11995361 未加载

jayd16almost 9 years ago

Would be great if Apple understood this and compressed IPA contents before encrypting.Instead, when you submit something to the AppStore, you end up with a much bigger app than the one you uploaded.To add insult to injury, if you ask Apple about this fuck up you get an esoteric support email about removing "contiguous zeros." As in, "make your app less compressible so it won't be obvious we're doing this wrong."

poelzialmost 9 years ago

if your compression can compress your encrypted data, you should change your encryption mechanism to something that actually works...

em3rgent0rdralmost 9 years ago

What if you compress and then only send data at regular periods and regular packet sizes? That way no information can be gleaned. E.g. after compressing you pad the data if it is unusually short, or you include other compressed data too, or you only use constant bit-rate compression algorithm.

huevingalmost 9 years ago

That quoted voip paper isn't actually as damaging as it sounds. IIRC that 0.6 rating was for less than half of the words so if you're trying to listen to a conversation to get something meaningful, it's probably not going to happen.

评论 #11995873 未加载

评论 #11998354 未加载

panicalmost 9 years ago

Has there been any research into compression that's generally safe to use before encryption? E.g., matching only common substrings longer than the key length would (I think?) defeat CRIME at the cost of compression ratio.

评论 #11996897 未加载

Qantouriscalmost 9 years ago

Maybe we need encryption that also plays with the length of the message / or randomly pad our date before encryption ? I am however no expert, so I have no clue how feasible, or full of holes this method would be .

评论 #11998308 未加载

itsnotvalidalmost 9 years ago

I am always thinking, if the compression scheme is known, you would need some good noonce to avoid known plaintext (for example, compression format's header is always the same), and also by CRIME, which is to remover the dictionary of the compression.I think it is best to use built-in compression scheme by the compression program to do the encryption first, as those often take these into account (and the header is not leaked, since only the content is encrypted).

cm2187almost 9 years ago

Can't you just add some random length data at the end. You are defeating compression a little bit, but are also making the length non deterministic. I thought pgp did that.

评论 #11997238 未加载

评论 #11997336 未加载

arielweisbergalmost 9 years ago

So what does this mean if I am using an encrypted SSL connection that is correctly configured?Is this kind of problem not already dealt with for me by the secure transport layer? It would be a shame if the abstraction were leaky. My understanding of the contract is that whatever bits I supply will be securely transported within the limits of the configuration I have selected.If I pick a bad configuration then yes shame on me, but a good configuration won't care if I compress right?

gravypodalmost 9 years ago

Logically speaking, an encrypted file should have a high entropy set of bits within it. Compressing it would be low return, but higher security since the input file contained more "random" bits.Compressing the source material will yield smaller results but will be more predictable as the file will always contain ZIP headers and other metadata that would possibly make decryption of your file much easier.

评论 #11994711 未加载

评论 #11994500 未加载

评论 #11994543 未加载

评论 #11994583 未加载

评论 #11994729 未加载

jtolmaralmost 9 years ago

If I compress each component (ie: attacker-influenced vs secret) separately, concatenate the results (with message lengths of course), then encrypt the whole message, is that secure?It seems like it should be, but I'm not an encryption expert. The compression should be pretty good, though.

khcalmost 9 years ago

> The paper Phonotactic Reconstruction of Encrypted VoIP Conversations gives a technique for reconstructing speach from an encrypted VoIP call.The technique to reconstructing speech clearly had its limitations.

draugadrottenalmost 9 years ago

This blog is an interesting way to advertise to their target market: us.

评论 #11994580 未加载

gameofdronesalmost 9 years ago

The OP should take <a href="https://www.coursera.org/learn/crypto" rel="nofollow">https://www.coursera.org/learn/crypto</a>

评论 #11997772 未加载

kstenerudalmost 9 years ago

So if the length of the resulting message is leaking information, salt it by adding some extra random bits to the end to increase the length by a random amount.

评论 #11997389 未加载

arjiealmost 9 years ago

None of this seems to apply to documents you generate to supply to someone else you trust. Compress and encrypt seems perfectly fine.

评论 #11996505 未加载

FuturePromisealmost 9 years ago

Given the real risk of CRIME attacks, are there "compression aware" encryption algorithms?

评论 #11996958 未加载

justinzollarsalmost 9 years ago

tl;dr

vox_mollisalmost 9 years ago

A lot of comments here suggesting that encryption increases entropy. While true, it only adds the key's entropy to the plaintext's entropy. In most real-world cases, len(m) >> len(k), so this is usually an insignificant increase of entropy. Compression also adds a trivial amount of entropy (specifically, the information encoding the algorithm used to compress, even if that information is out of band).

评论 #11995710 未加载

usloth_wandowsalmost 9 years ago

I thought this was common sense. Compress then encrypt. Encryption leads to higher entropy, therefore less effective compression.

评论 #11994486 未加载

评论 #11994496 未加载

评论 #11994562 未加载

评论 #11994534 未加载