GGML – AI at the Edge

899 点作者 georgehill将近 2 年前

38 条评论

samwillis将近 2 年前

ggml and llama.cpp are such a good platform for local LLMs, having some financial backing to support development is brilliant. We should be concentrating as much as possible to do local inference (and training) based on privet data.I want a local ChatGPT fine tuned on my personal data running on my own device, not in the cloud. Ideally open source too, llama.cpp is looking like the best bet to achieve that!

评论 #36217604 未加载

评论 #36216465 未加载

评论 #36217847 未加载

评论 #36221973 未加载

评论 #36216377 未加载

评论 #36216508 未加载

yukIttEft将近 2 年前

Its graph execution is still full of busyloops, e.g.:<a href="https://github.com/ggerganov/llama.cpp/blob/44f906e8537fcec965e312d621c80556d6aa9bec/ggml.c#L14575">https://github.com/ggerganov/llama.cpp/blob/44f906e8537fcec9...</a>I wonder how much more efficient it would be when Taskflow lib was used instead, or even inteltbb.

评论 #36217006 未加载

评论 #36218226 未加载

评论 #36217540 未加载

评论 #36217840 未加载

graycat将近 2 年前

WOW! They are using BFGS! Haven't heard of that in decades! Had to think a little: Yup, the full name is Broyden–Fletcher–Goldfarb–Shanno for iterative unconstrained non-linear optimization!Some of the earlier descriptions of the optimization being used in the AI learning was about steepest descent, that is, just find the gradient of the function are trying to minimize and move some distance in that direction. Just using the gradient was concerning since that method tends to zig zag where after, say, 100 iterations the distance moved in the 100 iterations might be several times farther than the distance from the starting point of the iterations to the final one. Can visualize this zig zag already in just two dimensions, say, following a river, say, a river that curves, down a valley the river cut over a million years or so, that is, a valley with steep sides. Then gradient descent may keep crossing the river and go maybe 10 feet for each foot downstream!Right, if just trying to go downhill on a tilted flat plane, then the gradient will point in the steepest descent on the plane and gradient descent will go all way downhill in just one iteration.In even moderately challenging problems, BFGS can a big improvement.

kretaceous将近 2 年前

Georgi's Twitter announcement: <a href="https://twitter.com/ggerganov/status/1666120568993730561" rel="nofollow">https://twitter.com/ggerganov/status/1666120568993730561</a>

评论 #36216686 未加载

TechBro8615将近 2 年前

I believe ggml is the basis of llama.cpp (the OP says it's "used by llama.cpp")? I don't know much about either, but when I read the llama.cpp code to see how it was created so quickly, I got the sense that the original project was ggml, given the amount of pasted code I saw. It seemed like quite an impressive library.

评论 #36215954 未加载

评论 #36218722 未加载

noman-land将近 2 年前

Georgi if you're reading this, I've had a lot of fun with whisper.cpp llama.cpp because of you so thank you very much.

评论 #36224059 未加载

iamflimflam1将近 2 年前

I've always thought on the edge to be IoT type stuff. So running on embedded devices. But maybe that not the case?

评论 #36219239 未加载

评论 #36219263 未加载

评论 #36224934 未加载

评论 #36227647 未加载

KronisLV将近 2 年前

Just today, I finished a blog post (also my latest submission, felt like could be useful to some) about how to get something like this working in a bundle of something to run models, as well as a web UI for more easy interaction - in my case that was koboldcpp, which can run GGML, both on the CPU (with OpenBLAS) and on the GPU (with CLBlast). Thanks to Hugging Face, getting Metharme, WizardLM or other models is also extremely easy, and the 4-bit quantized ones provide decent performance even on commodity hardware!I tested it out both locally (6c/12t CPU) and on a Hetzner CPX41 instance (8 AMD cores, 16 GB of RAM, no GPU), the latter of which costs about 25 EUR per month and still can generate decent responses in less than half a minute, my local machine needing approx. double that time. While not quite as good as one might expect (decent response times mean maxing out CPU for the single request, if you don't have a compatible GPU with enough VRAM), the technology is definitely at a point where it's possible for it to make people's lives easier in select use cases with some supervision (e.g. customer support).What an interesting time to be alive, I wonder where we'll be in a decade.

评论 #36219214 未加载

评论 #36218947 未加载

评论 #36218767 未加载

评论 #36220027 未加载

_20p0将近 2 年前

This guy is damned good. I sponsored him on Github because his software is dope. I also like how when some controversy erupted on the project he just ejected the controversial people and moved on. Good stewardship. Great code.I recall something like when he first ported it and it worked on my M1 Max he hadn't even yet tested it on Apple Silicon since he didn't have the hardware.Honestly, with this and whisper, I am a huge fan. Good luck to him and the new company.

评论 #36216199 未加载

评论 #36216264 未加载

评论 #36216131 未加载

评论 #36216191 未加载

aryamaan将近 2 年前

Could someone at high level talk more about how one starts contributing to this kind of problems.For the people who build solutions for data handling— ranging from crud to building highly scalable solutions— these things are alien concepts. (Or maybe I am just talking about it myself)

world2vec将近 2 年前

Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?[0] <a href="https://github.com/geohot/tinygrad">https://github.com/geohot/tinygrad</a>

评论 #36218539 未加载

评论 #36216187 未加载

sva_将近 2 年前

Really impressive work and I've asked this before, but is it really a good thing to have basically the whole library in a single 16k line file?

评论 #36219310 未加载

评论 #36227321 未加载

评论 #36220176 未加载

ankitg12将近 2 年前

Quite impressive, able to run a LLM on my local mac<pre><code> % ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "Let's talk about Machine Learning now" main: seed = 1686112244 gpt2_model_load: loading model from 'models/gpt-2-117M/ggml-model.bin' gpt2_model_load: n_vocab = 50257 gpt2_model_load: n_ctx = 1024 gpt2_model_load: n_embd = 768 gpt2_model_load: n_head = 12 gpt2_model_load: n_layer = 12 gpt2_model_load: ftype = 1 gpt2_model_load: qntvr = 0 gpt2_model_load: ggml tensor size = 224 bytes gpt2_model_load: ggml ctx size = 384.77 MB gpt2_model_load: memory size = 72.00 MB, n_mem = 12288 gpt2_model_load: model size = 239.08 MB extract_tests_from_file : No test file found. test_gpt_tokenizer : 0 tests failed out of 0 tests. main: prompt: 'Let's talk about Machine Learning now' main: number of tokens in prompt = 7, first 8 tokens: 5756 338 1561 546 10850 18252 783 Let's talk about Machine Learning now. The first step is to get a good understanding of what machine learning is. This is where things get messy. What do you think is the most difficult aspect of machine learning? Machine learning is the process of transforming data into an understanding of its contents and its operations. For example, in the following diagram, you can see that we use a machine learning approach to model an object. The object is a piece of a puzzle with many different components and some of the problems it solves will be difficult to solve for humans. What do you think of machine learning as? Machine learning is one of the most important, because it can help us understand how our data are structured. You can understand the structure of the data as the object is represented in its representation. What about data structures? How do you find out where a data structure or a structure is located in your data? In a lot of fields, you can think of structures as main: mem per token = 2008284 bytes main: load time = 366.33 ms main: sample time = 39.59 ms main: predict time = 3448.31 ms / 16.74 ms per token main: total time = 3894.15 ms</code></pre>

boringuser2将近 2 年前

Looking at the source of this kind of underlines the difference between machine learning scientist types and actual computer scientists.

评论 #36227334 未加载

danieljanes将近 2 年前

Does GGML support training on the edge? We're especially interested in training support for Android+iOS

评论 #36216179 未加载

评论 #36225580 未加载

评论 #36216069 未加载

edfletcher_t137将近 2 年前

This is a bang-up idea, you absolutely love to see capital investment on this type of open, commodity-hardware-focused foundational technology. Rock on GGMLers & thank you!

doxeddaily将近 2 年前

This scratches my itch for no dependencies.

huevosabio将近 2 年前

Very exciting!Now, we just need a post that benchmarks the different options (ggml, tvm, AItemplate, hippoml) and helps deciding which route to take.

mliker将近 2 年前

congrats! I was just listening to your changelog interview from months ago in which you said you were going to move on from this after you brush up the code a bit, but it seems the momentum is too great. Glad to see you carrying this amazing project(s) forward!

pawelduda将近 2 年前

I happen to have RPi 4B with HomeAssistant. Is this something I could set up on it and integrate with HA to control it with speech, or is it overkill?

评论 #36218115 未加载

评论 #36221504 未加载

Tan-Aki将近 2 年前

Can anyone explain to me, in simple terms, and at a high level, what the heck am I looking at? What is this library for? What does it mean "it is used by lama.cpp and whisper.cpp"? How is it revolutionary? Thank you very much in advance!

评论 #36222597 未加载

评论 #36225600 未加载

评论 #36222346 未加载

rvz将近 2 年前

> Nat Friedman and Daniel Gross provided the pre-seed funding.Why? Why should VCs get involved again?They are just going to look for an exit and end up getting acquired by Apple Inc.Not again.

评论 #36216214 未加载

评论 #36216267 未加载

评论 #36215977 未加载

评论 #36239156 未加载

评论 #36216061 未加载

conjecTech将近 2 年前

Congratulations! How do you plan to make money?

评论 #36217079 未加载

ex3ndr将近 2 年前

so sad we still don't have a perfect neural network for activation word to make home assistants complete

legendofbrando将近 2 年前

Running whisper locally on my iPhone back in December and watching perfect transcriptions pop out without sending anything to a server was a real lightbulb moment for me that set in motion a bunch of the work I’m doing now. Excited to see the new heights this unlocks!

评论 #36223623 未加载

kyt将近 2 年前

I used the GGML version of Whisper and I had to revert back to the PyTorch version released by OpenAI. The GGML version simply didn't work well even for the same model. I am assuming it has to do with the quantization.

s1k3s将近 2 年前

I'm out of the loop on this entire thing so call me an idiot if I get it wrong. Isn't this whole movement based on a model leak from Meta? Aren't licenses involved that prevent it from going commercial?

评论 #36219616 未加载

评论 #36220025 未加载

评论 #36221012 未加载

nivekney将近 2 年前

On a similar thread, how does it compare to Hippoml?Context: <a href="https://news.ycombinator.com/item?id=36168666" rel="nofollow">https://news.ycombinator.com/item?id=36168666</a>

评论 #36216469 未加载

jeadie将近 2 年前

I'm very glad that this has some added funding. I am building a serverless API on the cloudflare edge network using GGML as the backbone --> tryinfima.com

dangrover将近 2 年前

There's so much potential for this as a tech powering all sorts of products, I hope it doesn't just become some Tarsnap type thing. (judging from initial site)

评论 #36223613 未加载

Havoc将近 2 年前

How common is avx on edge platforms?

评论 #36216269 未加载

评论 #36217034 未加载

version_five将近 2 年前

So I'll say it: I understand why someone would do it, I'm sure they backed up a money truck, but it sucks to see this sell out. VC is going to suck all the value out and leave something that exists to funnel money to them. This has been an awesome project. I hope somebody forks it and maintains a version that isn't profit motivated.

评论 #36222446 未加载

评论 #36220908 未加载

zkmlintern将近 2 年前

ZKML fixes this

dbyte将近 2 年前

Congrats

FailMore将近 2 年前

Commenting to remember. Looks good

FailMore将近 2 年前

Remember

Dwedit将近 2 年前

There was a big stink one time when the file format changed, causing older model files to become unusable on newer versions of llama.cpp.

okhuman将近 2 年前

The establishment of ggml.ai a company focusing ggml and llama.cpp, the most innovative and exciting platform to come for local LLMs, on a Open Core model is just laziness.Just because you can (and have the connections), doesn't mean you should. It's a sad state of OSS when the best most brightest developers/founders reach for antiquated models.Maybe we take up a new rules in OSS communities that say you must release your CORE software as MIT at the same time you plan to go Open Core (and no sooner).Why should OSS communities take on your product market fit?!

评论 #36216855 未加载

38 条评论

samwillis将近 2 年前

评论 #36217604 未加载

评论 #36216465 未加载

评论 #36217847 未加载

评论 #36221973 未加载

评论 #36216377 未加载

评论 #36216508 未加载

yukIttEft将近 2 年前

评论 #36217006 未加载

评论 #36218226 未加载

评论 #36217540 未加载

评论 #36217840 未加载

graycat将近 2 年前

kretaceous将近 2 年前

Georgi's Twitter announcement: <a href="https://twitter.com/ggerganov/status/1666120568993730561" rel="nofollow">https://twitter.com/ggerganov/status/1666120568993730561</a>

评论 #36216686 未加载

TechBro8615将近 2 年前

评论 #36215954 未加载

评论 #36218722 未加载

noman-land将近 2 年前

Georgi if you're reading this, I've had a lot of fun with whisper.cpp llama.cpp because of you so thank you very much.

评论 #36224059 未加载

iamflimflam1将近 2 年前

I've always thought on the edge to be IoT type stuff. So running on embedded devices. But maybe that not the case?

评论 #36219239 未加载

评论 #36219263 未加载

评论 #36224934 未加载

评论 #36227647 未加载

KronisLV将近 2 年前

评论 #36219214 未加载

评论 #36218947 未加载

评论 #36218767 未加载

评论 #36220027 未加载

_20p0将近 2 年前

评论 #36216199 未加载

评论 #36216264 未加载

评论 #36216131 未加载

评论 #36216191 未加载

aryamaan将近 2 年前

world2vec将近 2 年前

Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?[0] <a href="https://github.com/geohot/tinygrad">https://github.com/geohot/tinygrad</a>

评论 #36218539 未加载

评论 #36216187 未加载

sva_将近 2 年前

Really impressive work and I've asked this before, but is it really a good thing to have basically the whole library in a single 16k line file?

评论 #36219310 未加载

评论 #36227321 未加载

评论 #36220176 未加载

ankitg12将近 2 年前

boringuser2将近 2 年前

Looking at the source of this kind of underlines the difference between machine learning scientist types and actual computer scientists.

评论 #36227334 未加载

danieljanes将近 2 年前

Does GGML support training on the edge? We're especially interested in training support for Android+iOS

评论 #36216179 未加载

评论 #36225580 未加载

评论 #36216069 未加载

edfletcher_t137将近 2 年前

This is a bang-up idea, you absolutely love to see capital investment on this type of open, commodity-hardware-focused foundational technology. Rock on GGMLers & thank you!

doxeddaily将近 2 年前

This scratches my itch for no dependencies.

huevosabio将近 2 年前

Very exciting!Now, we just need a post that benchmarks the different options (ggml, tvm, AItemplate, hippoml) and helps deciding which route to take.

mliker将近 2 年前

pawelduda将近 2 年前

I happen to have RPi 4B with HomeAssistant. Is this something I could set up on it and integrate with HA to control it with speech, or is it overkill?

评论 #36218115 未加载

评论 #36221504 未加载

Tan-Aki将近 2 年前

评论 #36222597 未加载

评论 #36225600 未加载

评论 #36222346 未加载

rvz将近 2 年前

评论 #36216214 未加载

评论 #36216267 未加载

评论 #36215977 未加载

评论 #36239156 未加载

评论 #36216061 未加载

conjecTech将近 2 年前

Congratulations! How do you plan to make money?

评论 #36217079 未加载

ex3ndr将近 2 年前

so sad we still don't have a perfect neural network for activation word to make home assistants complete

legendofbrando将近 2 年前

评论 #36223623 未加载

kyt将近 2 年前

s1k3s将近 2 年前

评论 #36219616 未加载

评论 #36220025 未加载

评论 #36221012 未加载

nivekney将近 2 年前

On a similar thread, how does it compare to Hippoml?Context: <a href="https://news.ycombinator.com/item?id=36168666" rel="nofollow">https://news.ycombinator.com/item?id=36168666</a>

评论 #36216469 未加载

jeadie将近 2 年前

I'm very glad that this has some added funding. I am building a serverless API on the cloudflare edge network using GGML as the backbone --> tryinfima.com

dangrover将近 2 年前

There's so much potential for this as a tech powering all sorts of products, I hope it doesn't just become some Tarsnap type thing. (judging from initial site)

评论 #36223613 未加载

Havoc将近 2 年前

How common is avx on edge platforms?

评论 #36216269 未加载

评论 #36217034 未加载

version_five将近 2 年前

评论 #36222446 未加载

评论 #36220908 未加载

zkmlintern将近 2 年前

ZKML fixes this

dbyte将近 2 年前

Congrats

FailMore将近 2 年前

Commenting to remember. Looks good

FailMore将近 2 年前

Remember

Dwedit将近 2 年前

There was a big stink one time when the file format changed, causing older model files to become unusable on newer versions of llama.cpp.

okhuman将近 2 年前

评论 #36216855 未加载