There's no compress or encrypt _first_.<p>It's just compress or not, before encrypting. If security is important, the answer to that is no, unless you're an expert and familiar with CRIME and related attacks.<p>Compression after encryption is useless, as there should be NO recognizable patterns to exploit after the encryption.
A more interesting question is whether to compress or <i>sign</i> first.<p>There's an interesting article on that topic by Ted Unangst:<p>"preauthenticated decryption considered harmful"<p><a href="http://www.tedunangst.com/flak/post/preauthenticated-decryption-considered-harmful" rel="nofollow">http://www.tedunangst.com/flak/post/preauthenticated-decrypt...</a><p>EDIT: Although the article talks about encrypt+sign versus sign+encrypt, the same argument goes for compress+sign versus sign+compress. You shouldn't do anything with untrusted data before having checked the signature - neither uncompress nor decrypt nor anything else.
Where everyone seems to be getting confused is handling a live flow versus handling a finalized flow (a file).<p>* Always pad to combat plain-text attacks, padding in theory shouldn't compress well so there's no point making the compression less effective by processing it.<p>* Always compress a 'file' first to reduce entropy.<p>* Always pad-up a live stream, maybe this data is useful in some other way, but you want interactive messages to be of similar size.<p>* At some place in the above also include a recipient identifier; this should be counted as part of the overhead not part of the padding.<p>* The signature should be on everything above here (recipients, pad, compressed message, extra pad).<p>. It might be useful to include the recipients in the un-encrypted portion of the message, but there are also contexts where someone might choose otherwise; an interactive flow would assume both parties knew a key to communicate with each other on and is one such case.<p>* The pad, message, extra-pad, and signature /must/ be encrypted. The recipients /may/ be encrypted.<p>I did have to look up the sign / encrypt first question as I didn't have reason to think about it before. In general I've looked to experts in this field for existing solutions, such as OpenPGP (GnuPG being the main implementation). Getting this stuff right is DIFFICULT.
This is why military voice encryption sends at a constant bitrate even when you're not talking. For serious security applications where fixed links are used, data is transmitted at a constant rate 24/7, even if the link is mostly idle.
Wow, what a trainwreck. So many comments in here talking about whether it would be possible to compress data which looks like uniformly random data, for all the tests you would throw at it. Spoiler alert, you can't compress encrypted data. This isn't a question of whether we know it's possible, rather, it's a fact that we know it's impossible.<p>In fact, if you successfully compress data after encryption, then the only logical conclusion is that you've found a flaw in the encryption algorithm.
Also interesting is <i>which</i> compression algorithm you're using. HPACK Header compression in HTTP 2.0 is an attempt to mitigate this problem:<p><a href="https://http2.github.io/http2-spec/compression.html#Security" rel="nofollow">https://http2.github.io/http2-spec/compression.html#Security</a>
The paper cited in this article (<i>Phonotactic Reconstruction of Encrypted VoIP Conversations</i>) really deserves to be highlighted, so I submitted it separately:<p><a href="https://news.ycombinator.com/item?id=11995298" rel="nofollow">https://news.ycombinator.com/item?id=11995298</a><p><a href="http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf" rel="nofollow">http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf</a>
I don't understand... Why couldn't you do CRIME with no compression as well? Assuming you can control (parts of) the plaintext, surely plaintext+encrypt gives you more information than plaintext+compress+encrypt?
I picked up on the reference to Stockfighter, but does anyone know if the walking machine learning game mentioned at the end of the article exists? Sounds like a fun game.
Would adding some tiny random size help? Based on my poorly understanding, if after compress, but before encrypt we add random 0 to 16 bytes or 1% of size that could defeat quite a lot of attacks (like CRIME).
Despite the question being flawed. The correct answer is a series of questions:
Who is the attacker?
What are you guarding?
What assumptions are there about the operating environment?
What invariants (regulations, compliance, etc) exist?<p>There may be compensating controls that invalidate the perceived needs for encryption or compression, for example. i.e. don't design in the dark.<p>Of course, the interviewer may just want a canned scripted answer - but the interview is your chance to shine, showing how you can discuss all the angles.
That was a fun read. Do I detect a nod to tptacek's "If You’re Typing the Letters A-E-S Into Your Code You’re Doing It Wrong"?<p><a href="https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2009/july/if-youre-typing-the-letters-a-e-s-into-your-code-youre-doing-it-wrong/" rel="nofollow">https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...</a>
Would be great if Apple understood this and compressed IPA contents before encrypting.<p>Instead, when you submit something to the AppStore, you end up with a much bigger app than the one you uploaded.<p>To add insult to injury, if you ask Apple about this fuck up you get an esoteric support email about removing "contiguous zeros." As in, "make your app less compressible so it won't be obvious we're doing this wrong."
What if you compress and then only send data at regular periods and regular packet sizes? That way no information can be gleaned. E.g. after compressing you pad the data if it is unusually short, or you include other compressed data too, or you only use constant bit-rate compression algorithm.
That quoted voip paper isn't actually as damaging as it sounds. IIRC that 0.6 rating was for less than half of the words so if you're trying to listen to a conversation to get something meaningful, it's probably not going to happen.
Has there been any research into compression that's generally safe to use before encryption? E.g., matching only common substrings longer than the key length would (I think?) defeat CRIME at the cost of compression ratio.
Maybe we need encryption that also plays with the length of the message / or randomly pad our date before encryption ? I am however no expert, so I have no clue how feasible, or full of holes this method would be .
I am always thinking, if the compression scheme is known, you would need some good noonce to avoid known plaintext (for example, compression format's header is always the same), and also by CRIME, which is to remover the dictionary of the compression.<p>I think it is best to use built-in compression scheme by the compression program to do the encryption first, as those often take these into account (and the header is not leaked, since only the content is encrypted).
Can't you just add some random length data at the end. You are defeating compression a little bit, but are also making the length non deterministic. I thought pgp did that.
So what does this mean if I am using an encrypted SSL connection that is correctly configured?<p>Is this kind of problem not already dealt with for me by the secure transport layer? It would be a shame if the abstraction were leaky. My understanding of the contract is that whatever bits I supply will be securely transported within the limits of the configuration I have selected.<p>If I pick a bad configuration then yes shame on me, but a good configuration won't care if I compress right?
Logically speaking, an encrypted file should have a high entropy set of bits within it. Compressing it would be low return, but higher security since the input file contained more "random" bits.<p>Compressing the source material will yield smaller results but will be more predictable as the file will always contain ZIP headers and other metadata that would possibly make decryption of your file much easier.
If I compress each component (ie: attacker-influenced vs secret) separately, concatenate the results (with message lengths of course), then encrypt the whole message, is that secure?<p>It seems like it should be, but I'm not an encryption expert. The compression should be pretty good, though.
> The paper Phonotactic Reconstruction of Encrypted VoIP Conversations gives a technique for reconstructing speach from an encrypted VoIP call.<p>The technique to reconstructing speech clearly had its limitations.
The OP should take <a href="https://www.coursera.org/learn/crypto" rel="nofollow">https://www.coursera.org/learn/crypto</a>
So if the length of the resulting message is leaking information, salt it by adding some extra random bits to the end to increase the length by a random amount.
A lot of comments here suggesting that encryption increases entropy. While true, it only adds the key's entropy to the plaintext's entropy. In most real-world cases, len(m) >> len(k), so this is usually an insignificant increase of entropy. Compression <i>also</i> adds a trivial amount of entropy (specifically, the information encoding the algorithm used to compress, even if that information is out of band).