TechEcho

9 comments

adeonabout 2 years ago

I've counted three different Rust LLaMA implementations on r/rust subreddit this week:<a href="https://github.com/Noeda/rllama/">https://github.com/Noeda/rllama/</a> (pure Rust+OpenCL)<a href="https://github.com/setzer22/llama-rs/">https://github.com/setzer22/llama-rs/</a> (ggml based)<a href="https://github.com/philpax/ggllama">https://github.com/philpax/ggllama</a> (also ggml based)There's also a discussion on GitHub issue on setzer's repo to collaborate a bit on these separate efforts: <a href="https://github.com/setzer22/llama-rs/issues/4">https://github.com/setzer22/llama-rs/issues/4</a>

评论 #35174660 未加载

评论 #35174534 未加载

unshavedyakabout 2 years ago

Anyone know if these LLaMA models can have a large pile of context fed in? Eg to have the "AI" act like ChatGPT with a specific knowledge base you feed in?Ie imagine you feed in the last year of chatlogs of yours, and then ask the Assistant queries about the chatlogs. Compound that with your Wiki, itinerary, etc. Is this possible with LLaMA? Where might it fail in doing this?(and yes, i know this is basically autocomplete on steroids. I'm still curious hah)

评论 #35175410 未加载

评论 #35175477 未加载

评论 #35175847 未加载

cozzydabout 2 years ago

I feel like <a href="https://github.com/ggerganov/llama.cpp/issues/171">https://github.com/ggerganov/llama.cpp/issues/171</a> is a better approach here?With how fast llama.cpp is changing, this seems like a lot of churn for no reason.

评论 #35172933 未加载

ilovefoodabout 2 years ago

Great job porting the C++ code! Seems like the reasoning was to provide the code as a library to embed in a HTTP Server, cannot wait to see that happen and try it out.Looking at how the inference runs, this shouldn't be a big problem, right? <a href="https://github.com/setzer22/llama-rs/blob/main/llama-rs/src/main.rs#L42">https://github.com/setzer22/llama-rs/blob/main/llama-rs/src/...</a>

评论 #35172448 未加载

petercooperabout 2 years ago

Can someone a lot smarter than me give a basic explanation as to why something like this can run at a respectable speed on the CPU whereas Stable Diffusion is next to useless on them? (That is to say, 10-100x slower, whereas I have not seen GPU based LLaMA go 10-100x faster than the demo here.) I had assumed there were similar algorithms at play.

评论 #35175008 未加载

xiphias2about 2 years ago

ggml should be ported as well to make it really count, use rust’s multithreading for fun

taf2about 2 years ago

Funny that he had a hard time converting llama.cop to expose a web server… I was just asking gpt 4 to write one for me… will hopefully have a pr ready soon

mattfrommarsabout 2 years ago

Anyone more knowledgeable in this space please explain what is meant by inference?From what I know. LLaMa is built in Python and assuming PyTorch, does this port in Rust make use of Python process or is it what LLaMA algorithm is fully written in Rust?

评论 #35185363 未加载

recuterabout 2 years ago

From the readme, to preempt the moaning: "I just like collecting imaginary internet points, in the form of little stars, that people seem to give to me whenever I embark on pointless quests for rewriting X thing, but in Rust."OK? Just don't. Let us have this. :)

评论 #35172526 未加载

评论 #35178902 未加载

评论 #35173371 未加载

9 comments

adeonabout 2 years ago

评论 #35174660 未加载

评论 #35174534 未加载

unshavedyakabout 2 years ago

评论 #35175410 未加载

评论 #35175477 未加载

评论 #35175847 未加载

cozzydabout 2 years ago

评论 #35172933 未加载

ilovefoodabout 2 years ago

评论 #35172448 未加载

petercooperabout 2 years ago

评论 #35175008 未加载

xiphias2about 2 years ago

ggml should be ported as well to make it really count, use rust’s multithreading for fun

taf2about 2 years ago

Funny that he had a hard time converting llama.cop to expose a web server… I was just asking gpt 4 to write one for me… will hopefully have a pr ready soon

Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU

9 comments

Llama.rs – Rust port of llama.cpp for fast LLaMA inference on CPU

9 comments