TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What to do with text from old, unarchived, online forums?

131 pointsby stevengoodwinover 1 year ago
I have a stack of printouts of various discussion forums from between 1991-4. Mostly chatter, but several interview transcriptions, news reports, and magazine articles that no longer appear to be online.<p>They vary from Billboard charts, to Associated Press reports on Michael Jackson&#x27;s wealth, and Debbie Gibson discussion groups!<p>Are they useful&#x2F;interesting as pieces of Internet history?<p>Are the copyright&#x2F;privacy issues too onerous to scan&#x2F;publish?<p>Do I simply send a box of paper to archive.org to worry about?

19 comments

textfilesover 1 year ago
Happy to take them.
评论 #38882436 未加载
评论 #38888159 未加载
toomuchtodoover 1 year ago
Hey there, I do some work with the Archive folks as an ad hoc physical archive volunteer. I&#x27;ll pay for you to ship them and facilitate getting them ingested into items and a collection if you&#x27;d like. Let me know how to get in touch.<p>Once scanned, derive operations will take care of OCR and generating relevant metadata from the artifacts. If there are any issues (copyright, etc) preventing these items from being public, they&#x27;ll be made private by patron services.<p>(no affiliation with the Archive, just a volunteer)
评论 #38886764 未加载
评论 #38881040 未加载
评论 #38880262 未加载
SushiHippieover 1 year ago
Maybe this FAQ about physically donating stuff to the internet archive answers some questions for you<p><a href="https:&#x2F;&#x2F;help.archive.org&#x2F;help&#x2F;frequently-asked-questions&#x2F;" rel="nofollow">https:&#x2F;&#x2F;help.archive.org&#x2F;help&#x2F;frequently-asked-questions&#x2F;</a>
giancarlostoroover 1 year ago
Its a shame, Google used to host Usenet groups and the other day I clicked a link of something old that Google was supposed to be hosting, and got literally nothing back.<p>Dear Google, you could have added ads to those old results to justify keeping them up, but you failed because you&#x27;re more interested in locking things down than actually building up an open web.<p>Now there&#x27;s decades worth of old usenet content probably gone, my understanding is a lot of it was donated by old usenet providers that shut down or just couldnt keep hosting really old but historic texts for whatever reason.
评论 #38888750 未加载
yregover 1 year ago
A question from someone a bit younger:<p>What was the purpose of printing out online discussions? Was it to read them while not at the computer? Was it to physically archive them?
评论 #38879829 未加载
评论 #38880352 未加载
评论 #38879737 未加载
评论 #38880998 未加载
评论 #38882642 未加载
评论 #38879968 未加载
评论 #38881359 未加载
评论 #38890170 未加载
评论 #38879826 未加载
评论 #38880621 未加载
评论 #38880368 未加载
kdklolover 1 year ago
&gt;Do I simply send a box of paper to archive.org to worry about?<p>That&#x27;s what I would do, although I&#x27;d probably scan it for them, or at least would send an email first. Also, good on you for trying to preserve early internet history. You have my admiration.
DoubleDerperover 1 year ago
You can scan and OCR (adobe) this yourself with not much more effort than boxing and shipping at FedEx<p>Then repost on related public-facing online forums or a free wordpress site or upload to archive.org
评论 #38879885 未加载
bravuraover 1 year ago
A friend of mine, technical but not a hacker, wanted to turn an old offline forum into a chatbot. I know there are many <i>not so good</i> RAGs to do this. What are the current best practices, as of this month?
NoZebra120vClipover 1 year ago
I sort of regard these commenters as crazy. I never had the habit of printing stuff off so I could read it later. I read it online, or not at all. And by &quot;online&quot; I mean from computer storage on the screen, not necessarily over a modem connection or something.<p>The fundamental disconnect you guys have is this: in order to print something out, it needs to be stored somewhere first. So you indeed downloaded a text file in order to print it. Surely you saved it somewhere, at least in RAM if not permanent storage. Otherwise the printout wouldn&#x27;t happen.<p>The sorts of things that I used the printer for were school reports and papers, and especially Print Shop style banners. It was really fun to run off a larger-than-life &quot;HAPPY BIRTHDAY&quot; sign that was basically professional DTP style with good fonts, graphics and the whole bit.<p>At school the chief use of the Line Printers was for large-format ASCII art. We&#x27;d take some GIFS of rather prurient pin-up shots, or anime or just some interesting subject, and run it through an ASCII art generator, then print out something suitable for covering an entire wall. Sometimes we&#x27;d even print out the Pascal code we were working on, so we could mark it up and sort of edit it offline. But that was the exception to prove the rule.
评论 #38888090 未加载
sdsdover 1 year ago
I really enjoy finding old imageboards that have been abandoned for years, sometimes decades.<p>A lot of them are on waybackmachine, but <i>finding out about them</i> is often the hard part.<p>I created a chan discussion board on my own imageboard (<a href="http:&#x2F;&#x2F;13channel.crabdance.com&#x2F;chan&#x2F;index.html" rel="nofollow">http:&#x2F;&#x2F;13channel.crabdance.com&#x2F;chan&#x2F;index.html</a>) to research this kind of stuff - but only for chans specifically.
zxexzover 1 year ago
It would be really cool if this became a trend. I&#x27;m sure tons of people have printouts from back then.
j45over 1 year ago
It would be likely easiest to have it digitized first.<p>Either use an all in one printer or scanner with an automatic document feeder, or send that box to a service to scan.<p>Once digitized there are a lot more places interested in it
billpgover 1 year ago
Did we have web forums back then? I know Usenet was a thing.
评论 #38879663 未加载
评论 #38879635 未加载
评论 #38879427 未加载
评论 #38879653 未加载
评论 #38879444 未加载
chislobogover 1 year ago
Probably not the right place for this but consider what happened to the truecrypt forums. Two custodian copies online is a good thing.
dobinover 1 year ago
Related, I have a lot of IRC logs form 1997-2004. Is there somehow an IRC archive project?
评论 #38890345 未加载
countWSSover 1 year ago
Just select the most significant stuff and digitize it yourself.
kolinkoover 1 year ago
Definitely, Archive.org is the best bet.
alnwlsnover 1 year ago
Paging Jason Scott, @textfiles
评论 #38882193 未加载
catchnear4321over 1 year ago
we got ourselves a stringer for the cic.<p>well, not quite. soon.<p>models gotta feed.