TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Does anybody need 1.7M screenshots?

6 pointsby tomw1808over 8 years ago
Hi HN,<p>I just saw that one of my projects [1] which is constantly crawling websites, is still making screenshots in addition of each website. I completely forgot about the screenshots thing as I do not utilize them anymore.<p>It is in total around 1,790,000 full-page screenshots from sites that were posted on reddit, hacker news, tweets, financial news, since Jan 2014.<p>Don&#x27;t ask me to open-source them and make them available for download. I dare not to get involved in any licensing issues or whatever.<p>They are on an S3 Bucket. Just got a bill from Amazon...<p>If you&#x27;d like to have them or you have any idea what I can do with them, outside of deleting, contact me at thomas@newscombinator.com<p>Thomas<p>[1] http:&#x2F;&#x2F;www.newscombinator.com

5 comments

mchannonover 8 years ago
This sort of data falls under the category of &quot;I might need this someday but I can&#x27;t figure out why&quot;.<p>For the reasons you can&#x27;t think of, perhaps you might consider indexing them locally then throwing them onto Glacier. The odds of you needing every single one of them (thus making Glacier cost-prohibitive) are far less than the odds of you needing one at random.<p>I haven&#x27;t done the math on how many months of S3 hosting it takes to equal the upload cost once to Glacier, primarily because I don&#x27;t know how big 1790000 screenshots are.<p>Alternatively, provided downloading them all to your local desktop doesn&#x27;t run your S3 bill to Mars, tape drives can still be quite cost-effective ways to store a LOT of data, cheap.
评论 #12677610 未加载
kfratover 8 years ago
&gt; I dare not to get involved in any licensing issues or whatever<p>You might be covered under the DMCA: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dmca" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dmca</a><p>And since they&#x27;re screenshots some might be considered fair use: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Fair_use" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Fair_use</a><p><i>Some</i> being blatant copies of logos, etc
评论 #12678384 未加载
VertexRedover 8 years ago
Sounds cool and it&#x27;s more like 1.8M.<p>Sadly I don&#x27;t think it&#x27;d be much use for anyone since Archive.org takes care of all archiving.<p>The only time that I&#x27;d see screenshots come handy is for live previews.
dorfussover 8 years ago
It would be great to see such a collection from 1990s...
majortennisover 8 years ago
What was the bill , eeek
评论 #12678360 未加载