TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Practical Llama 3 inference implemented in a single Java file

54 点作者 dgiagio12 个月前

3 条评论

mike_hearn12 个月前
Only 2000 lines! There&#x27;s a few reasons this is really, really nice.<p>1. CPU inferencing can be surprisingly practical for a lot of long-tail use cases where there isn&#x27;t a lot of load. As you can see from the animation, you get a decent rate of tok&#x2F;sec and that&#x27;s not really optimized beyond using the vector API. Think about things where you don&#x27;t need to generate a lot of tokens to be useful, just comprehension or a yes&#x2F;no type answer is sufficient. For example using it as a fallback to faster but less general NLP libraries for things like date parsing.<p>2. Most LLM inferencing systems are set up as servers and require custom APIs or API clients. Being able to just copy&#x2F;paste a file into a Java&#x2F;Kotlin&#x2F;Scala&#x2F;Clojure project and go means no additional deployment complexity beyond ensuring the model weights can be found on disk.<p>3. It&#x27;s a lot easier to read and well commented compared to quite a few LLM impls, which are unfortunately often &quot;hacker output&quot; not really written for comprehensibility.<p>4. Because it&#x27;s such a small and comprehensible code base it makes it much easier to experiment with tweaks to the inferencing algorithm that are very use case specific, like forced decoding strategies. These often require a lot of rummaging around in barely typed Python codebases that are littered with the remnants of many prior experiments if you want to do it on the regular LLM stacks.
blendo12 个月前
Notable to me is it uses Java’s nee vector API: <a href="https:&#x2F;&#x2F;openjdk.org&#x2F;jeps&#x2F;469" rel="nofollow">https:&#x2F;&#x2F;openjdk.org&#x2F;jeps&#x2F;469</a>
theanonymousone12 个月前
JBang is a piece of jewel in the Java ecosystem about which far more people should know.