科技回声

15 条评论

Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

评论 #37421497 未加载

评论 #37421214 未加载

评论 #37420948 未加载

评论 #37421862 未加载

评论 #37421945 未加载

评论 #37421196 未加载

评论 #37424918 未加载

评论 #37420537 未加载

logicchains超过 1 年前

Pretty amazing that in such a short span of time we went from people being amazed how powerful GPT3.5 was upon its release to people being able to run something equivalently powerful locally.

评论 #37432698 未加载

regularfry超过 1 年前

4-bit quantised model, to be precise.When does this guy sleep?

评论 #37423012 未加载

评论 #37421721 未加载

评论 #37421224 未加载

评论 #37420746 未加载

评论 #37420405 未加载

sbierwagen超过 1 年前

The screenshot shows a working set size of 147,456 mb, so he's using the mac studio with 192 gb of ram?

评论 #37420327 未加载

m3kw9超过 1 年前

OpenAIs moat will soon largely be UX. Anyone can do plugins, code etc but when operating by everyday users the best UX wins after LLM becomes commodified. Just look at stand alone digital cameras vs mobile phone cams from Apple.

评论 #37423170 未加载

评论 #37422623 未加载

评论 #37424079 未加载

homarp超过 1 年前

<a href="https://www.reddit.com/r/LocalLLaMA/comments/16bynin/falcon_180b_initial_cpu_performance_numbers/" rel="nofollow noreferrer">https://www.reddit.com/r/LocalLLaMA/comments/16bynin/falcon_...</a> has some more data like sample answers with various level of quantizationsand <a href="https://huggingface.co/TheBloke/Falcon-180B-Chat-GGUF" rel="nofollow noreferrer">https://huggingface.co/TheBloke/Falcon-180B-Chat-GGUF</a> if you want to try

doctoboggan超过 1 年前

Georgi is doing so much to democratize LLM access, I am very thankful he is doing it all on apple silicon!

pella超过 1 年前

Is this an M2 Ultra with 192 GB of unified memory, or the standard version with 64 GB of unified memory?

评论 #37420741 未加载

Havoc超过 1 年前

Great progress, but I also can't help but feel a sense of apprehension on the access front.An M2 Ultra while consumer tech is affordable to a fairly small % of the world population.

ViktorBash超过 1 年前

It's refreshing to see how fast open LLMs are advancing in terms of the models available. A year ago I thought that besides for the novelty of it, running LLMs locally would be nowhere close to stuff like OpenAI's closed models in terms of utility.As more and more models become open and are able to be run locally, the precedent gets stronger (which is good for the end consumer in my opinion).

randomopining超过 1 年前

Is there any actual usecases to run this stuff on a local computer? Or are most of these models actually suited to run on remote clusters?

评论 #37421858 未加载

评论 #37423070 未加载

评论 #37421417 未加载

two_in_one超过 1 年前

Just wondering what are local LLMs used for today? So far they look more like a.. promising.

tiffanyh超过 1 年前

<pre><code> system_info: n_threads = 4 / 24 </code></pre> Am I seeing correctly in the video that this ran on only 4 threads?

评论 #37422416 未加载

growt超过 1 年前

So how much ram did the machine have?

rvz超过 1 年前

Totally makes sense for C++ or Rust based AI models for inference instead of the over-bloated networks run on Python with sub-optimal inference and fine-tuning costs.Minimal overhead or zero cost abstractions around deep learning libraries implemented in those languages gives some hope that people like ggerganov are not afraid of the 'don't roll your own deep learning library' dogma and now we can see the results as to why DL on the edge and local AI, is the future of efficiency in deep learning.We'll see, but Python just can't compete on speed at all, henceforth Modular's Mojo compiler is another one that solves the problem properly with the almost 1:1 familiarity of Python.

评论 #37420605 未加载

评论 #37420734 未加载

评论 #37421354 未加载

评论 #37420484 未加载

评论 #37422072 未加载

15 条评论

adam_arthur超过 1 年前

评论 #37421497 未加载

评论 #37421214 未加载

评论 #37420948 未加载

评论 #37421862 未加载

评论 #37421945 未加载

评论 #37421196 未加载

评论 #37424918 未加载

评论 #37420537 未加载

logicchains超过 1 年前

Pretty amazing that in such a short span of time we went from people being amazed how powerful GPT3.5 was upon its release to people being able to run something equivalently powerful locally.

评论 #37432698 未加载

regularfry超过 1 年前

4-bit quantised model, to be precise.When does this guy sleep?

评论 #37423012 未加载

评论 #37421721 未加载

评论 #37421224 未加载

评论 #37420746 未加载

评论 #37420405 未加载

sbierwagen超过 1 年前

The screenshot shows a working set size of 147,456 mb, so he's using the mac studio with 192 gb of ram?

评论 #37420327 未加载

m3kw9超过 1 年前

评论 #37423170 未加载

评论 #37422623 未加载

评论 #37424079 未加载

homarp超过 1 年前

doctoboggan超过 1 年前

Georgi is doing so much to democratize LLM access, I am very thankful he is doing it all on apple silicon!

pella超过 1 年前

Is this an M2 Ultra with 192 GB of unified memory, or the standard version with 64 GB of unified memory?

评论 #37420741 未加载

Havoc超过 1 年前

Great progress, but I also can't help but feel a sense of apprehension on the access front.An M2 Ultra while consumer tech is affordable to a fairly small % of the world population.

ViktorBash超过 1 年前

randomopining超过 1 年前

Is there any actual usecases to run this stuff on a local computer? Or are most of these models actually suited to run on remote clusters?

评论 #37421858 未加载

评论 #37423070 未加载

评论 #37421417 未加载

two_in_one超过 1 年前

Just wondering what are local LLMs used for today? So far they look more like a.. promising.

tiffanyh超过 1 年前

<pre><code> system_info: n_threads = 4 / 24 </code></pre> Am I seeing correctly in the video that this ran on only 4 threads?

Running a 180B parameter LLM on a single Apple M2 Ultra

15 条评论

Running a 180B parameter LLM on a single Apple M2 Ultra

15 条评论