TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LLaMa running at 5 tokens/second on a Pixel 6

221 点作者 pr337h4m大约 2 年前

11 条评论

naillo大约 2 年前
This is really cool but the output is such garbage at that weight size that you might as well be running a markov chain.
评论 #35173362 未加载
评论 #35173392 未加载
评论 #35176036 未加载
评论 #35172290 未加载
评论 #35175587 未加载
评论 #35171416 未加载
superkuh大约 2 年前
Until it thermally throttles 40 seconds later. But yeah, it's really cool how many platforms the vanilla code in llama.cpp can be easily compiled on. And somehow I doubt they did the quantization step on the Pixel itself. My favorite was the person who did it on the rpi4. I know a guy working on getting it going on rpi3 but the ARM7/8 mixing , NEON support, and 64 bit ARM intrinsics are apparently non-trivial to convert.
评论 #35172723 未加载
评论 #35171566 未加载
beiller大约 2 年前
Here is a thread to tweak the parameters which the model seems very sensitive to:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;issues&#x2F;129">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;issues&#x2F;129</a>
评论 #35174157 未加载
OscarCunningham大约 2 年前
This would be useful for predictive text. That&#x27;s exactly what LLMs are actually built for.
评论 #35174050 未加载
__mharrison__大约 2 年前
I&#x27;m waiting until it runs on my C64...
评论 #35171840 未加载
tosh大约 2 年前
Did anyone get this to run on an iPhone or in a browser yet?
评论 #35171914 未加载
syntaxing大约 2 年前
Does this in theory mean it should be relatively easy to port to coral TPU?
评论 #35172323 未加载
评论 #35171947 未加载
nshm大约 2 年前
It is not really llama, it is llama quantized to 4bit. Not even the quality of original 7B. I could also quantize it to 1 bit and claim it runs on my RPI3.
评论 #35172133 未加载
评论 #35172748 未加载
评论 #35172120 未加载
评论 #35172426 未加载
Havoc大约 2 年前
Any more details? I&#x27;m guessing they&#x27;re leveraging the NPU in the pixel?
评论 #35171703 未加载
评论 #35171725 未加载
snapplebobapple大约 2 年前
So is this finally peak hipster coder and from this point on rust will diminish because all the cool kids start switching to zag?
a-dub大约 2 年前
would be even cooler if it employed the accelerator!<p>(unless this ggml library is doing that under the hood)<p>i assume it has unified memory, but maybe not little numbers...