TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Would denormalizing a string prevent AI/LLM consumption?

1 点作者 MattyRad超过 1 年前
Hi. With burgeoning AI, I don&#x27;t particularly like the idea of my persona being unwittingly scraped into an AI corpus.<p>Would denormalizing a string to unicode help prevent AI from matching text in a prompt? For example, changing &quot;The quick brown fox&quot; to &quot;𝓣𝓱𝓮 𝓺𝓾𝓲𝓬𝓴 𝓫𝓻𝓸𝔀𝓷 𝓯𝓸𝔁&quot; or &quot;apple&quot; to &quot;ÁÞÞlé&quot;. Since the obfuscated strings use different tokens, they wouldn&#x27;t match in a prompt, correct? And although normalization of strings is possible, would it be (im)possible to scale it in LLMs?<p>Note that I&#x27;m not suggesting that an AI couldn&#x27;t <i>produce</i> obfuscated unicode, it can. This question is only about preventing one&#x27;s text from aiding a corpus.

1 comment

PaulHoule超过 1 年前
I was working on foundation models for business and we had done some work on character embeddings that would counteract that back in 2017.<p>Pro Tip: people whose ideas were worth stealing were worried about Google’s web scraping and the whole economy about it were unfair and exploitative 10 years ago. Suddenly the people whose ideas aren’t worth stealing are up in arms about it.<p>Think more about having ideas that are worth stealing (e.g. <i>leading</i> the herd not <i>following</i> the herd) instead of getting your ideas stolen.