TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A Tweet is Worth (at Least) 140 Words With this Compression Algorithm

67 pointsby Umaluover 13 years ago

11 comments

waitwhatover 13 years ago
A hack for the sake of doing a hack.<p>If you <i>really</i> want to compress as many words as you want into a tweet, just include a link: <a href="http://www.example.com/really-long-article.txt" rel="nofollow">http://www.example.com/really-long-article.txt</a>
评论 #2961910 未加载
waffle_ssover 13 years ago
I think the most realistic way to compress a tweet would be to replace words like "before" with "b4", "too"/"to" with "2", reduce whitespace (e.g. double spaces to single), and maybe start ripping out vowels ("vowels" -&#62; "vwls").<p>Although not as efficient as demonstrated above, there are no external dependencies needed; the content can be decompressed by the reader's brain in-place at the slight cost of being difficult to immediately parse/understand.
评论 #2958020 未加载
评论 #2958201 未加载
petercooperover 13 years ago
I wondered how a naive approach would work in comparison: You can reliably represent just over 2^20 codepoints in a UTF-8 character or 2800 bits over 140 characters. Standard ASCII is 2^7. 2800/7 gives us a potential 400 ASCII characters using a naive approach alone or a compression of 2.86x compared to the 5x he mentions.
blauwbilgorgelover 13 years ago
Great hacking. If you like this topic, this should be relevant: <a href="http://stackoverflow.com/questions/891643/twitter-image-encoding-challenge" rel="nofollow">http://stackoverflow.com/questions/891643/twitter-image-enco...</a> (Compressing images in tweets)
instakillover 13 years ago
All that's missing is a custom Twitter client that does compression upon tweeting and in the various timeline columns ala Tweetdeck, does decompression, it could make for something interesting. Too bad that would be against Twitter's TOS.
bmalicoatover 13 years ago
Good article, but isn't the author describing Huffman codes?
评论 #2957668 未加载
jconnopover 13 years ago
To extend upon this idea - if you really wanted to maximise the data you could transmit in a single twitter message you could use the full 31bits of unicode (instead of just the chinese subset) and then apply standard lossless data compression techniques to the generated unicode for further improvement.
评论 #2957872 未加载
tomotomoover 13 years ago
I made a quick Chrome extension (userscript wasn't going to have enough permissions) for this which is up at <a href="https://chrome.google.com/webstore/detail/idcnolgflhcckjdfpfbcehjocggffdjk" rel="nofollow">https://chrome.google.com/webstore/detail/idcnolgflhcckjdfpf...</a>
waffle_ssover 13 years ago
Unicode is really fun. In a similar vein, my first Rails app was a URL shortener that also takes advantage of Twitter's Unicode character counting method:<p><a href="http://menosgrande.org" rel="nofollow">http://menosgrande.org</a>
cpetersoover 13 years ago
So Twitter allows 140 (UTF-8?) characters, regardless of the number bytes? The article wasn't clear about this.
评论 #2957848 未加载
评论 #2958016 未加载
heydenberkover 13 years ago
Would it be possible to use the tweet metadata to cram more data into a tweet?
评论 #2958963 未加载