TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How much space would it take to store every word ever said?

49 pointsby jonlucaabout 5 years ago

12 comments

perl4everabout 5 years ago
Assuming 10:1 compression, you have 50 exabytes, and it appears that would be about 500 of the trucks Amazon uses to load large amounts of data. I can&#x27;t find information on how many they actually have, or whether the capacity has increased from the 100 PB figure mentioned in a lot of places.<p>Amazon&#x27;s FAQ is funny:<p>&quot;Q: Can I export data from AWS with Snowmobile?<p>Snowmobile does not support data export. It is designed to let you quickly, easily, and more securely migrate exabytes of data to AWS&quot;<p>...you can check out any time you like, but you can never leave.
评论 #22766835 未加载
评论 #22774718 未加载
franciscopabout 5 years ago
&gt; We could also use UTF8, but since we assumed the language is German, we’ll stick to ASCII<p>German cannot be expressed in ASCII[1]. For that fact, neither can Chinese nor Spanish, the two most spoken languages besides English. Also UTF8 doesn&#x27;t even encode all the languages ever spoken. So IMHO this is at least an order of magnitude wrong.<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=9222071" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=9222071</a>
评论 #22766559 未加载
评论 #22766792 未加载
评论 #22766061 未加载
评论 #22766484 未加载
评论 #22766020 未加载
评论 #22767281 未加载
DoofusOfDeathabout 5 years ago
Sometimes I hear someone utter a sentence which I <i>guess</i> has never before been uttered by anybody. I really wish I had a way to verify that, just for fun.
评论 #22765936 未加载
评论 #22766864 未加载
评论 #22767172 未加载
lilyballabout 5 years ago
&gt; <i>10 billion words, times an average word length of 11.66, gets us ~4.8 billion individual characters spoken per person per lifetime.</i><p>Am I missing something or is this math very wrong?
评论 #22766491 未加载
评论 #22765832 未加载
评论 #22774392 未加载
thanszabout 5 years ago
This reminds me of a very fun and interesting read called &quot;A Short Stay in Hell&quot; by Steven Peck, which provides an entertaining perspective on infinity and very, very large finite time periods. It&#x27;s about a Mormon who goes to hell (because Zoroastrianism happens to be the One True Religion). Hell does not last forever though. For the main character, it&#x27;s a library that contains every possible communication that could exist. Once he finds the book that contains the story of his life, he gets out. Very fun read that addresses large but finite values, although it focuses more on time rather than space.
sudosushiabout 5 years ago
Interestingly, no one has mentioned the Library of Babel[0]<p>One could assert that if you were to translate Chinese&#x2F;Russian&#x2F;non-UTF characters into UTF, you&#x27;d be covering every word ever possibly said.<p>[0] <a href="https:&#x2F;&#x2F;libraryofbabel.info&#x2F;" rel="nofollow">https:&#x2F;&#x2F;libraryofbabel.info&#x2F;</a>
jmullabout 5 years ago
Hm... just the words loses so much -- the tone, the emphasis, the pauses. I think we&#x27;d have to do at least audio. Though of course expressions, hand movements and bearing count too, so I&#x27;m thinking we need a number for video as well.
评论 #22765883 未加载
slewisabout 5 years ago
If we can assume determinism, there’s a much better compression algorithm.
QuadrupleAabout 5 years ago
This would be an interesting dataset to explore! A biographer&#x27;s dream. Insider information on every corporate &amp; governmental decision in history. Intimate daily-life details from early hominids.
KiDDabout 5 years ago
But what if it was encoded in a novel storage format such as DNA?
评论 #22765948 未加载
purplezooeyabout 5 years ago
You can leave out all of Twitter
Avshalomabout 5 years ago
It&#x27;s not taking into account UTF-8 though so maybe double or triple.
评论 #22765959 未加载
评论 #22765683 未加载