A hack for the sake of doing a hack.<p>If you <i>really</i> want to compress as many words as you want into a tweet, just include a link: <a href="http://www.example.com/really-long-article.txt" rel="nofollow">http://www.example.com/really-long-article.txt</a>
I think the most realistic way to compress a tweet would be to replace words like "before" with "b4", "too"/"to" with "2", reduce whitespace (e.g. double spaces to single), and maybe start ripping out vowels ("vowels" -> "vwls").<p>Although not as efficient as demonstrated above, there are no external dependencies needed; the content can be decompressed by the reader's brain in-place at the slight cost of being difficult to immediately parse/understand.
I wondered how a naive approach would work in comparison: You can reliably represent just over 2^20 codepoints in a UTF-8 character or 2800 bits over 140 characters. Standard ASCII is 2^7. 2800/7 gives us a potential 400 ASCII characters using a naive approach alone or a compression of 2.86x compared to the 5x he mentions.
Great hacking. If you like this topic, this should be relevant: <a href="http://stackoverflow.com/questions/891643/twitter-image-encoding-challenge" rel="nofollow">http://stackoverflow.com/questions/891643/twitter-image-enco...</a> (Compressing images in tweets)
All that's missing is a custom Twitter client that does compression upon tweeting and in the various timeline columns ala Tweetdeck, does decompression, it could make for something interesting. Too bad that would be against Twitter's TOS.
To extend upon this idea - if you really wanted to maximise the data you could transmit in a single twitter message you could use the full 31bits of unicode (instead of just the chinese subset) and then apply standard lossless data compression techniques to the generated unicode for further improvement.
I made a quick Chrome extension (userscript wasn't going to have enough permissions) for this which is up at <a href="https://chrome.google.com/webstore/detail/idcnolgflhcckjdfpfbcehjocggffdjk" rel="nofollow">https://chrome.google.com/webstore/detail/idcnolgflhcckjdfpf...</a>
Unicode is really fun. In a similar vein, my first Rails app was a URL shortener that also takes advantage of Twitter's Unicode character counting method:<p><a href="http://menosgrande.org" rel="nofollow">http://menosgrande.org</a>