TechEcho

1 comment

Yeah, it's a three-pronged threat, isn't it?<p>I think this article adequately outlines those threats. You've got the question of content added to the project, the question of the projects' text being used to train LLMs, and you've got sources now that are using LLM content to write news articles and other stuff. And the article also mentions all the great uses of ML already incorporated into bots and the MediaWiki software, and the cloud platform.<p>I've already witnessed one fairly widespread case of an editor who began adding big swaths of LLM content to articles. They were caught, blocked, and reverted. Now we already deal with similar problems in the copyright field. There are often editors who violate copyright "under the radar" for years, and make many, many edits, and then they are caught, blocked, and a cleanup process is initiated. There's a whole process for that, and it's perpetually understaffed and backlogged. But the catching of violators tends to happen at a fairly good clip.<p>I think that if we can accurately catch copyright violators, then we should also be able to find LLM-only editors. Even though you can't simply diff with an existing source, there are hallmarks, and eventually, they reveal themselves.<p>I'm not sure if keeping LLM content off Wikipedia will improve it, per se. A lot of editors are bad writers already, and also write outlandish stuff that's unsourced. So it may be a wash. But we have to try.

Wikipedia Will Survive A.I

1 comment

Wikipedia Will Survive A.I

1 comment