It's funny, when I took a tour of the US Geological Survey, the curator of the collection hated Google (which was just a few blocks away). He said Google is great <i>now</i>, with all their maps, which were far more accurate and had better coverage than the USGS.<p>But what happens when they get bored with map data and get rid of it?<p>He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.<p>At the time we all told him not to worry, Google would never remove data it had collected. Looks like he was a lot smarter than us.
Just recently I collected all of the archives of comp.lang.ada I could find and imported them into a public-inbox repository. There's a gap around 1992 that I couldn't find a copy of, but it's otherwise complete. It took a few days to get everything into the right format and get SpamAssassin dialed in, but it would certainly be possible to do this for the other comp.* groups if one had the patience.<p><a href="https://archive.legitdata.co/" rel="nofollow">https://archive.legitdata.co/</a><p><a href="https://archive.legitdata.co/comp.lang.ada/" rel="nofollow">https://archive.legitdata.co/comp.lang.ada/</a><p><a href="https://public-inbox.org/README.html" rel="nofollow">https://public-inbox.org/README.html</a>
The vast majority of the spam content is injected into these newsgroups via Google Groups itself, and is not even seen on other NNTP servers.<p>Blocking posting access to these newsgroups from GG is generally a good thing for those newsgroups.<p>Not being able to search the archive is the unfortunate collateral damage though. Google is not obliged to provide a Usenet archive, I suppose.<p>Formerly obtained deep links to the content also do not work!<p>If you formely cited a comp.lang.lisp article by giving a direct link into Google Groups, people navigating it now get a permission error.
Google's handling of these critical archives they were given is pretty abhorrent. The usenet archives should really be made public since there is no business value to them and they don't care about usenet.
The fact that nobody had enough fucks to give to archive these groups tells you everything you need to know about decentralized peer-to-peer proof-of-work blockchain nerd hobbies. This content exists on a completely open peer-to-peer content distribution network and here you are whining that one company -- the company that already rescued this archive in a midnight U-Haul run 20 years ago -- failed to archive it.
Google has bought dejanews and has profited immensely from open source and open information.<p>So I do think they have an obligation either a) to make the whole archive available for anyone or b) maintain it properly.<p>Properly means restoring the fast UI from around 2004.
This type of behavior is why I can never consider GCP. How many people have been burned at this point by Google randomly shutting down something they rely on?
One thing that's become extremely clear to me over the last decade or so is that almost all tech companies simply <i>do not care about the past</i>, and I suspect at least part of that is so their narrative of progress can be subjected to fewer challenges from those who look back and compare.<p>Also, and this may be a bit of a tangential point, but the "deny the past because it has something <i>bad</i>" that Google has effectively done here is uncomfortably close to the set of recent and far more political events.
> Usenet predates Google's spam handling tools<p>In fact Usenet predates spam itself, since the first spam (Canter & Siegel) was on Usenet itself in 1994 (I was there).
Anyone looking for a hobby? It is time to become a data hoarder <a href="https://www.reddit.com/r/DataHoarder/" rel="nofollow">https://www.reddit.com/r/DataHoarder/</a>
Either those Usenet groups are not part of the world, or they don't consist of information, or Google just failed at "organizing the world's information."
I read the article and I read the threads here, and maybe I missed it—but why did these groups disappear? Were they banned due to bad words or a mistaken spam filter?
<a href="https://www.lumendatabase.org/notices/search?utf8=%E2%9C%93&term=%22comp.lang.forth%22+%22comp.lang.lisp%22&sort_by=" rel="nofollow">https://www.lumendatabase.org/notices/search?utf8=%E2%9C%93&...</a><p>Looks like there has been (likely automated, nearly all of them are the same Italian phrase) mechanical legal complaints and it probably caused this instance of automated blocking going wild.<p>As an engineer I can understand the desire to automate everything, but please at least have some heuristics to detect this kind of easy-to-detect mechanical behavior before giving the model a full authority to block anyone it doesn't like.
> since there is no other comprehensive archive after Google's purchase of Dejanews around 20 years ago<p>Was I naive in thinking that The Internet Archive would have long archived this type of thing?
Too many people and companies don’t appreciate culture enough. Maintaining a cultural record should apparently not be left to just one company.<p>Thanks for posting this, it reminded me to donate again to archive.org, which I just did.<p>I use ‘culture’ to include anything creative, anything that we experience as humans. Everything should be preserved, schools should be well funded, as should the arts.
There is a comp.lang.lisp archive published in 2009.<p>> In 2009, Ron Garret published a 700MB archive file of all of comp.lang.lisp<p><a href="https://www.xach.com/naggum/articles/notes.html" rel="nofollow">https://www.xach.com/naggum/articles/notes.html</a>
Ridiculous. They are blaming missing moderators, but only Google would be able to solve the spam problem. They open now these old forums, and Gmail is mostly spam free. Now you cannot even browse the archives. Where is the internet police when you need them.
For a long time I've wanted to revisit some the old Usenet stuff. I knew someone in the who ran a commercial usenet feed service in the early 90s and their whole setup depended heavily on low level backplane configuration, number of spindles, disk rotation speed, etc. - a lot of details that AWS hides from most of us. Using everything I've learned about distributed systems in the last thirty years I bet I could build a really awesome news feed today.<p>Of course the downside of Usenet was most people expected conversations to disappear after a couple weeks or a month but there was always some jerk that kept everything and refused to delete anything.
It's becoming clear to me that Google has become a far, far worse monopoly than Microsoft ever was. Microsoft just controlled our computers; Google controls our access to history.
Why are people even relying on Google to keep any product alive? It's a business, not a charity. They don't do a single thing out of good will. It always has the goal of getting money in the short or long term. Knowing their quarterly obligations to shareholders, that's probably short term.<p>These groups should be putting more effort into federalisation and decentralisation. Make it possible to store all of this data in a distributed fashion and stop relying on a central authority for archiving purposes.
I was learning C, once upon a time, and had a bug that I couldn't figure out. It worked fine on Linux/x86, but was wrong on Solaris/sparc64. Deep Google diving found a newsgroup post from 1992 or so with a very similar problem; it was an endian problem. My search-fu may have been weak, but an old newsgroup post that helped me solve my problem, not stackoverflow or any other site.
I think everybody should have learned the lesson now - do not trust Google - or any other major megacorp, but especially Google - to preserve any data for longer that they are contractually obliged to. If there needs to be historic preservation, it should be done by independent organization specifically created for that purpose.
Can anyone tell me how Google got hold of the whole usenet (I know it was like 15-20 years ago) which looks to me like a community service kinda thing.<p>Like when Google decided it's going to host comp.lang.c, can there be only one comp.lang.c on the internet, or can someone else start hosting comp.lang.c as well?
They are really shooting their own feet which such moves.
They confirm, validate and strengthen the already existing trend to avoid vendor lock in at all cost and move to open, possibly self-hosted and export friendly platforms!<p>This is really bad marketing
This kind of thing makes it really easy to get interested, and stay interested, in decentralization tech.<p>Once you see things in this light, the new flavor of the month online service just doesn't hold any allure.
(Repeating one of the comments from the post):<p>> Has anyone (EFF?) considered the aspect of destroying evidence
of prior art in the public domain?<p>I think there’s a case to be made for stewardship of these groups for that reason.
I'm hearing a fair bit of chatter in SEO circles about google de-indexing pages so this certainly rings true.<p>I guess there was this unjustified assumption that google only adds & never subtracts.
Maybe it is something that a non-profit dedicated towards preserving knowledge and internet content (such as Internet Archive) should be handling anyways.
This is editorialized (actual title: "Some Usenet groups suspended in Goggle Groups"), or on LWN[1] "Historical programming-language groups disappearing from Google" (basically the same content)<p>[1]: <a href="https://lwn.net/Articles/827233/" rel="nofollow">https://lwn.net/Articles/827233/</a>