TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The GitHub 1000 year archive may be the last code dataset uncontaminated by AI

140 pointsby russianGuy83829about 2 years ago
I wonder if LLMs played a role in the decision to archive and preserve all that code.

13 comments

busterarmabout 2 years ago
I&#x27;m imagining some science fiction nightmare scenario where we have to pull the plug on some AI and have to throw out all software because we can&#x27;t trust that it doesn&#x27;t contain the building blocks to reproduce the AI.<p>But then we find out we&#x27;re fucked anyway because the AI has already conditioned human beings to write the software that will reproduce it...as a self-preservation strategy.<p>...cue The Outer Limits theme music.
评论 #34969325 未加载
评论 #34961617 未加载
htrpabout 2 years ago
This is one of the most relevant short stories I&#x27;ve read on &quot;AI contamination&quot;<p><a href="https:&#x2F;&#x2F;www.teamten.com&#x2F;lawrence&#x2F;writings&#x2F;coding-machines&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.teamten.com&#x2F;lawrence&#x2F;writings&#x2F;coding-machines&#x2F;</a>
评论 #34968672 未加载
评论 #35006885 未加载
评论 #34966463 未加载
h2odragonabout 2 years ago
From a security viewpoint; i wouldn&#x27;t trust that code not to have embedded AI seeds, anyway.<p>&quot;Reflections on trusting trust&quot; <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;358198.358210" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;358198.358210</a>
nmalekiabout 2 years ago
Good. One great archive is better than none.<p>AI rips apart conscious intent and reassembles it using what may best be described as piecewise functions. We lose the intricacy and the detail of individual thought of the unbroken line of thinkers that came before us when we interpret such piecewise functions as conscious intent.
amaiabout 2 years ago
That is a statement as useful as saying that science publications until 1955 may be last ones not contaminated by calculators.
iamdamianabout 2 years ago
What is a code dataset? And what is AI contamination? Are you saying it&#x27;s impossible to create a collection of hand-written code from here on out and know that none of it was generated by an LLM?
评论 #34960311 未加载
评论 #34963171 未加载
thisiswronggggabout 2 years ago
This. And I&#x27;m wondering whether this was the end of human forums on the net as well. I mean, who can tell whether the comments he reads are coming from a human or a tuned AI. And then the implications of this in politics...
sublinearabout 2 years ago
Nah, but it&#x27;s a great story to tell around the post-apocalyptic trash can fires.
ElijahLynnabout 2 years ago
LLM:<p>Large language models (LLMs) are a subset of artificial intelligence that has been trained on vast quantities of text data to produce human-like responses to dialogue or other natural language inputs. LLMs are used to make AI “smarter” and can recognize, summarize, translate, predict and generate text and other content based on knowledge gained from massive datasets. LLMs have the promise of transforming domains through learned knowledge and their sizes have been increasing 10X every year for the last few years.<p>source: NeevaAI (What is an LLM in AI?)
Froedlichabout 2 years ago
John Barnes wrote a novel called &quot;Kaleidoscope Century&quot; back in 1995.<p>AIs had been created, some went rogue, and then they were fighting each other for computing resources. Then humans started shutting down computers and fragmenting the network, the AIs wrote new software that would run in human brains. Once someone was running the new software, there wasn&#x27;t much room left for &quot;human.&quot;<p>Pretty much the worst-case scenario, at least of the ones I&#x27;ve seen so far.
jasfiabout 2 years ago
I&#x27;ve previously suggested the use of META tags on all pages where AI was used to help generate the content. But it seems this isn&#x27;t going to happen.
评论 #34965712 未加载
评论 #34967758 未加载
评论 #34966673 未加载
bobbbbbbbbbabout 2 years ago
Excuse my ignorance.<p>What does the acronym LLM means&gt;
评论 #35005492 未加载
seydorabout 2 years ago
an elegant weapon for a more civilized age