The actual attack is described near the end of the article—ChatGPT and friends send tokens as fast as they are generated, and the encrypted data doesn't mask the size of the token. This allows the attacker to guess the size of each token as it comes in, which they feed into a specialized model that guesses the word sequence based on the token lengths.<p>Mitigating this should be very simple—pad the token with null to some reasonable length before sending it, then clip it on the client side. This would result in slightly higher bandwidth usage, but not enough to be perceptible.
Related article on frontpage:<p>Cloudflare mitigates AI side channel attack | <a href="https://news.ycombinator.com/item?id=39703255">https://news.ycombinator.com/item?id=39703255</a>
Interesting read, from the title I figured the article was going to be about "hackers" breaching accounts and reading past conversations. I don't think reading private chats is the way to put it, more like inferring topics of discussion with some accuracy
Interesting attack, but it's easily mitigated. Just batch the responses into sentences or something instead of words. Or padding, as others have suggested. The services will adapt quickly and the issue will be solved.<p>It also requires a full packet capture of the target making it not very easy to execute.
This is actually a very interesting side channel attack for streaming data. They use an LLM to guess words from token length. Definitely, padding would help here.<p>Nit: Title should say “guess” and not “read”.