TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Tweet is Worth (at Least) 140 Words With this Compression Algorithm

67 点作者 Umalu超过 13 年前

11 条评论

waitwhat超过 13 年前
A hack for the sake of doing a hack.<p>If you <i>really</i> want to compress as many words as you want into a tweet, just include a link: <a href="http://www.example.com/really-long-article.txt" rel="nofollow">http://www.example.com/really-long-article.txt</a>
评论 #2961910 未加载
waffle_ss超过 13 年前
I think the most realistic way to compress a tweet would be to replace words like "before" with "b4", "too"/"to" with "2", reduce whitespace (e.g. double spaces to single), and maybe start ripping out vowels ("vowels" -&#62; "vwls").<p>Although not as efficient as demonstrated above, there are no external dependencies needed; the content can be decompressed by the reader's brain in-place at the slight cost of being difficult to immediately parse/understand.
评论 #2958020 未加载
评论 #2958201 未加载
petercooper超过 13 年前
I wondered how a naive approach would work in comparison: You can reliably represent just over 2^20 codepoints in a UTF-8 character or 2800 bits over 140 characters. Standard ASCII is 2^7. 2800/7 gives us a potential 400 ASCII characters using a naive approach alone or a compression of 2.86x compared to the 5x he mentions.
blauwbilgorgel超过 13 年前
Great hacking. If you like this topic, this should be relevant: <a href="http://stackoverflow.com/questions/891643/twitter-image-encoding-challenge" rel="nofollow">http://stackoverflow.com/questions/891643/twitter-image-enco...</a> (Compressing images in tweets)
instakill超过 13 年前
All that's missing is a custom Twitter client that does compression upon tweeting and in the various timeline columns ala Tweetdeck, does decompression, it could make for something interesting. Too bad that would be against Twitter's TOS.
bmalicoat超过 13 年前
Good article, but isn't the author describing Huffman codes?
评论 #2957668 未加载
jconnop超过 13 年前
To extend upon this idea - if you really wanted to maximise the data you could transmit in a single twitter message you could use the full 31bits of unicode (instead of just the chinese subset) and then apply standard lossless data compression techniques to the generated unicode for further improvement.
评论 #2957872 未加载
tomotomo超过 13 年前
I made a quick Chrome extension (userscript wasn't going to have enough permissions) for this which is up at <a href="https://chrome.google.com/webstore/detail/idcnolgflhcckjdfpfbcehjocggffdjk" rel="nofollow">https://chrome.google.com/webstore/detail/idcnolgflhcckjdfpf...</a>
waffle_ss超过 13 年前
Unicode is really fun. In a similar vein, my first Rails app was a URL shortener that also takes advantage of Twitter's Unicode character counting method:<p><a href="http://menosgrande.org" rel="nofollow">http://menosgrande.org</a>
cpeterso超过 13 年前
So Twitter allows 140 (UTF-8?) characters, regardless of the number bytes? The article wasn't clear about this.
评论 #2957848 未加载
评论 #2958016 未加载
heydenberk超过 13 年前
Would it be possible to use the tweet metadata to cram more data into a tweet?
评论 #2958963 未加载