TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Can you LLM a custom language?

1 pointsby campervansabout 1 year ago
If token limit and accuracy are important, it seems English (or other spoken languages) are no optimal.<p>They&#x27;re a butchered product of history and easy verbal noises.<p>A new custom language seems inevitable, that is concise, unambiguous, rooted in relation with custom words. Replacing common sentences with simple strings such as &quot;Once upon a time...&quot; to &quot;a1&quot;<p>Most likely alpha-numeric, to minimise tokens, and generate an order of magnitude increase in context window.<p>Followed by translation back to {language}<p>Is this possible? Anyone working on it?<p>(here to be educated)

1 comment

yorwbaabout 1 year ago
&gt; Replacing common sentences with simple strings<p>This is what byte-pair encoding does. It doesn&#x27;t go quite so far as to allocate only a single token to &quot;Once upon a time&quot;, because that string isn&#x27;t actually <i>that</i> common, but in principle it could.<p>Trying to get humans to produce content directly in such a concise representation is a waste of time, since LLMs heavily rely on the ability to take whatever content is already available on the internet, which drastically reduces the labor cost of acquiring training data.
评论 #39547319 未加载