TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Can you LLM a custom language?

1 点作者 campervans大约 1 年前
If token limit and accuracy are important, it seems English (or other spoken languages) are no optimal.<p>They&#x27;re a butchered product of history and easy verbal noises.<p>A new custom language seems inevitable, that is concise, unambiguous, rooted in relation with custom words. Replacing common sentences with simple strings such as &quot;Once upon a time...&quot; to &quot;a1&quot;<p>Most likely alpha-numeric, to minimise tokens, and generate an order of magnitude increase in context window.<p>Followed by translation back to {language}<p>Is this possible? Anyone working on it?<p>(here to be educated)

1 comment

yorwba大约 1 年前
&gt; Replacing common sentences with simple strings<p>This is what byte-pair encoding does. It doesn&#x27;t go quite so far as to allocate only a single token to &quot;Once upon a time&quot;, because that string isn&#x27;t actually <i>that</i> common, but in principle it could.<p>Trying to get humans to produce content directly in such a concise representation is a waste of time, since LLMs heavily rely on the ability to take whatever content is already available on the internet, which drastically reduces the labor cost of acquiring training data.
评论 #39547319 未加载