TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU

202 pointsby rrampageabout 2 years ago

9 comments

adeonabout 2 years ago
I&#x27;ve counted three different Rust LLaMA implementations on r&#x2F;rust subreddit this week:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Noeda&#x2F;rllama&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;Noeda&#x2F;rllama&#x2F;</a> (pure Rust+OpenCL)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;setzer22&#x2F;llama-rs&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;setzer22&#x2F;llama-rs&#x2F;</a> (ggml based)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;philpax&#x2F;ggllama">https:&#x2F;&#x2F;github.com&#x2F;philpax&#x2F;ggllama</a> (also ggml based)<p>There&#x27;s also a discussion on GitHub issue on setzer&#x27;s repo to collaborate a bit on these separate efforts: <a href="https:&#x2F;&#x2F;github.com&#x2F;setzer22&#x2F;llama-rs&#x2F;issues&#x2F;4">https:&#x2F;&#x2F;github.com&#x2F;setzer22&#x2F;llama-rs&#x2F;issues&#x2F;4</a>
评论 #35174660 未加载
评论 #35174534 未加载
unshavedyakabout 2 years ago
Anyone know if these LLaMA models can have a large pile of context fed in? Eg to have the &quot;AI&quot; act like ChatGPT with a specific knowledge base you feed in?<p>Ie imagine you feed in the last year of chatlogs of yours, and then ask the Assistant queries about the chatlogs. Compound that with your Wiki, itinerary, etc. Is this possible with LLaMA? Where might it fail in doing this?<p><i>(and yes, i know this is basically autocomplete on steroids. I&#x27;m still curious hah)</i>
评论 #35175410 未加载
评论 #35175477 未加载
评论 #35175847 未加载
cozzydabout 2 years ago
I feel like <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;issues&#x2F;171">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;issues&#x2F;171</a> is a better approach here?<p>With how fast llama.cpp is changing, this seems like a lot of churn for no reason.
评论 #35172933 未加载
ilovefoodabout 2 years ago
Great job porting the C++ code! Seems like the reasoning was to provide the code as a library to embed in a HTTP Server, cannot wait to see that happen and try it out.<p>Looking at how the inference runs, this shouldn&#x27;t be a big problem, right? <a href="https:&#x2F;&#x2F;github.com&#x2F;setzer22&#x2F;llama-rs&#x2F;blob&#x2F;main&#x2F;llama-rs&#x2F;src&#x2F;main.rs#L42">https:&#x2F;&#x2F;github.com&#x2F;setzer22&#x2F;llama-rs&#x2F;blob&#x2F;main&#x2F;llama-rs&#x2F;src&#x2F;...</a>
评论 #35172448 未加载
petercooperabout 2 years ago
Can someone a lot smarter than me give a basic explanation as to why something like this can run at a respectable speed on the CPU whereas Stable Diffusion is next to useless on them? (That is to say, 10-100x slower, whereas I have not seen GPU based LLaMA go 10-100x faster than the demo here.) I had assumed there were similar algorithms at play.
评论 #35175008 未加载
xiphias2about 2 years ago
ggml should be ported as well to make it really count, use rust’s multithreading for fun
taf2about 2 years ago
Funny that he had a hard time converting llama.cop to expose a web server… I was just asking gpt 4 to write one for me… will hopefully have a pr ready soon
mattfrommarsabout 2 years ago
Anyone more knowledgeable in this space please explain what is meant by inference?<p>From what I know. LLaMa is built in Python and assuming PyTorch, does this port in Rust make use of Python process or is it what LLaMA algorithm is fully written in Rust?
评论 #35185363 未加载
recuterabout 2 years ago
From the readme, to preempt the moaning: &quot;I just like collecting imaginary internet points, in the form of little stars, that people seem to give to me whenever I embark on pointless quests for rewriting X thing, but in Rust.&quot;<p>OK? Just don&#x27;t. Let us have this. :)
评论 #35172526 未加载
评论 #35178902 未加载
评论 #35173371 未加载