TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

“sync,corrected by elderman” issue in ML translation datasets spread on internet

3 点作者 mvolfik大约 2 年前

2 条评论

mvolfik大约 2 年前
I can&#x27;t find the true origin of this, but (unless I&#x27;m missing some old internet joke) it seems like some language models have some corrupt training data frequently including a string like &quot;== sync, corrected by elderman ==&quot;. Now searching for this phrase yields a ton of random results occurring in places where you would expect automatically translated spam. Some interesting mentions I found:<p>- it historically appeared in autotranslated game chats in Arena of Valor game <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;arenaofvalor&#x2F;comments&#x2F;btykru&#x2F;comment&#x2F;ep4o48c&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;arenaofvalor&#x2F;comments&#x2F;btykru&#x2F;commen...</a> - mention on GitHub repo of a translation model <a href="https:&#x2F;&#x2F;github.com&#x2F;Helsinki-NLP&#x2F;Opus-MT&#x2F;issues&#x2F;62">https:&#x2F;&#x2F;github.com&#x2F;Helsinki-NLP&#x2F;Opus-MT&#x2F;issues&#x2F;62</a><p>I&#x27;m curious to see if anyone else has interesting encounters with this
评论 #35196466 未加载
h2odragon大约 2 年前
i think that might&#x27;ve come from the rtfm.mit.edu FAQ archives, there were several documents there that had multiple language versions and were great bait for things needing translated text inputs.