Could you train a ChatGPT-beating model for $85k and run it in a browser?

430 点作者 sirteno大约 2 年前

34 条评论

whalesalad大约 2 年前

Are there any training/ownership models like Folding@Home? People could donate idle GPU resources in exchange for access to the data, and perhaps ownership. Then instead of someone needing to pony up $85k to train a model, a thousand people can train a fraction of the model on their consumer GPU and pool the results, reap the collective rewards.

评论 #35392285 未加载

评论 #35392121 未加载

评论 #35394023 未加载

评论 #35391999 未加载

评论 #35392110 未加载

评论 #35396086 未加载

评论 #35393351 未加载

评论 #35392139 未加载

评论 #35395089 未加载

评论 #35397308 未加载

ftxbro大约 2 年前

His estimate is that you could train a LLaMA-7B scale model for around $82,432 and then fine-tune it for a total of less than $85K. But when I saw the fine tuned LLaMA-like models they were worse in my opinion even than GPT-3. They were like GPT-2.5 or like that. Not nearly as good as ChatGPT 3.5 and certainly not ChatGPT-beating. Of course, far enough in the future you could certainly run one in the browser for $85K or much less, like even $1 if you go far enough into the future.

评论 #35391717 未加载

评论 #35391598 未加载

评论 #35396779 未加载

captainmuon大约 2 年前

I guess companies like OpenAI and Google have no incentives to make models use less resources. The compute required, and of course also their training data, is their moat.If you accept that your model knows less about the world - it doesn't have to know about every restaurant in mexico city or the biography of every soccer player around the world - then you can get away with much fewer parameters and much less training data. Then you can't query it like an oracle about random things anymore, but you shouldn't do that anyway. But it should still be able to do tasks like reformulating texts, judging simularity (by embedding distance), and so on.And TFA mentions it also, you could hook up your simple language model with something like ReAct to get really good results. I don't see it running in the browser, but if you had a license-wise clean model that you can run on premises on one or two GPUs, that would be huge for a lot of people!

评论 #35401663 未加载

评论 #35399356 未加载

lxe大约 2 年前

Keep in mind that image transformer models like stable diffusion are generally smaller than language models, so they are easier to fit in wasm space.Also. you can finetune llama-7b on a 3090 for about $3 using LoRA.

评论 #35391417 未加载

评论 #35393465 未加载

JasonZ2大约 2 年前

Does anyone know how the results from a 7B parameter model with bloomz.cpp (<a href="https://github.com/NouamaneTazi/bloomz.cpp">https://github.com/NouamaneTazi/bloomz.cpp</a>) compares to the 7B parameter Alpaca model with llama.cpp (<a href="https://github.com/ggerganov/llama.cpp">https://github.com/ggerganov/llama.cpp</a>)?I have the latter working on a M1 Macbook Air with very good results for what it is. Curious if bloomz.cpp is significantly better or just about the same.

captaincrowbar大约 2 年前

The big problem with AI R&D is that nobody can keep up with the big bux companies. It makes this kind of project a bit pointless. Even if you can run a GPT3-equivalent on a web browser, how many people are going to bother (except as a stunt) when GPT4 is available?

评论 #35393008 未加载

评论 #35392981 未加载

评论 #35397056 未加载

version_five大约 2 年前

If you have ~100k to spend, aren't there options to buy a gpu rather than just blow it all on cloud? How much is an 8xA100 machine?4xA100 is 75k, 8 is 140k <a href="https://shop.lambdalabs.com/deep-learning/servers/hyperplane/customize" rel="nofollow">https://shop.lambdalabs.com/deep-learning/servers/hyperplane...</a>

评论 #35391976 未加载

评论 #35391545 未加载

评论 #35391578 未加载

评论 #35392159 未加载

munk-a大约 2 年前

A wonderful thing about software development is that there is so much reserved space for creativity that we have huge gaps between costs and value. Whether the average person could do this for 85k I'm uncertain of - but there is a very significant slice of people that could do it for well under 85k now that the ground work has been done. This leads to the hilarious paradox where a software based business worth millions could be built on top of code valued around 60k to write.

评论 #35393398 未加载

评论 #35391613 未加载

thih9大约 2 年前

> as opposed to OpenAI’s continuing practice of not revealing the sources of their training data.Looks like that choice makes it more difficult to adopt, trust, or collaborate on the new tech.What are the benefits? Is there more to that than competitive advantage? If not, ClosedAI sounds more accurate.

Tryk大约 2 年前

Why doesn't someone just start a gofundme/kickstarter with the goal of funding the training of an open-source ChatGPT-capable model?

评论 #35395649 未加载

GartzenDeHaes大约 2 年前

It's interesting to me that LLaMA-nB's still produce reasonable results after 4-bit quantization of the 32-bit weights. Does this indicate some possibility of reducing the compute required for training?

lmeyerov大约 2 年前

It seems the quality goes up & cost goes down significantly with Colossal AI's recent push: <a href="https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b" rel="nofollow">https://medium.com/@yangyou_berkeley/colossalchat-an-open-so...</a>Their writeup makes it sounds like, net, 2X+ over Alpaca, and that's an early runThe browser side is interesting too. Browser JS VMs have a memory cap of 1GB, so that may ultimately be the bottleneck here...

评论 #35393884 未加载

评论 #35393424 未加载

评论 #35392279 未加载

make3大约 2 年前

Alpaca uses knowledge distillation (it's trained on outputs from OpenAI models). It's something to keep in mind. You're teaching your model to copy an other model's outputs.

评论 #35391689 未加载

评论 #35392221 未加载

brrrrrm大约 2 年前

The WebGPU demo mentioned in this post is insane. Blows any WASM approach out of the water. Unfortunately that performance is not supported anywhere but chrome canary (behind a flag)

评论 #35391724 未加载

fzliu大约 2 年前

I was a bit skeptical about loading a _4GB_ model at first. Then I double-checked: Firefox is using about 5GB of memory for me. My current open tabs are mail, calendar, a couple Google Docs, two Arxiv papers, two blog posts, two Youtube videos, milvus.io documentation, and chat.openai.com.A lot of applications and developers these days take memory management for granted, so embedding a 4GB model to significantly enhance coding and writing capabilities doesn't seem too far-fetched.

astlouis44大约 2 年前

WebGPU is going to be a major component in this. Modern GPU's prevalent in mobile devices, desktops and laptops, is more than enough to do all of this client side.

agnokapathetic大约 2 年前

> My friends at Replicate told me that a simple rule of thumb for A100 cloud costs is $1/hour.AWS charges $32/hr for an 8xA100s (p4d.24xlarge) which comes out to $4/hour/gpu. Yes you can get lower pricing with a 3 year reservation but thats not what this question is asking.You also need 256 nodes to be colocated on the same fabric -- which AWS will do for you but only if you reserve for years.

评论 #35391377 未加载

评论 #35391335 未加载

评论 #35391419 未加载

评论 #35393635 未加载

评论 #35391382 未加载

d4rkp4ttern大约 2 年前

Everyone seems to assume that all the “tricks” behind training ChatGPT are known. The only clues are in papers from ClosedAI like the InstructGPT paper. So we assume there is Supervised Fine Tuning, then Reward Modeling and finally RLHF.But there are most likely other tricks that ClosedAI has not published. These probably took years of R&D to come up with, others trying to replicate ChatGPT would need to come up with these tricks on their own.Also curiously the app was released in late 2022 while the knowledge cutoff is 2021 — I was curious why that might be, and one hypothesis I had was that it may have been because they wanted to keep the training data fixed while they iterated on numerous methods, hyperparameter tuning etc. All of these are unfortunately a defensive moat that ClosedAI has.

pavelstoev大约 2 年前

Training a ChatGPT-beating model for much less than $85,000is entirely feasible. At CentML, we're actively working on model training and inference optimization without affecting accuracy, which can help reduce costs and make such ambitious projects realistic. By maximizing (>90%) GPU and platform hardware utilization, we aim to bring down the expenses associated with large-scale models, making them more accessible for various applications. Additionally, our solutions also have a positive environmental impact, addressing the excess CO2 concerns. If you're interested in learning more about how we are doing it, please reach out via our website: <a href="https://centml.ai" rel="nofollow">https://centml.ai</a>

nwoli大约 2 年前

What we need is a RETRO style model where basically after the input you go through a small net that just fetches a desired set of weights from a server (serving data without compute is dirt cheap) and is then executed locally. We’ll get there eventually

评论 #35394621 未加载

breck大约 2 年前

Just want to say SimonW has become one of my favorite writers covering the AI revolution. Always fun thought experiments with linked code and very constructive for people thinking about how to make this stuff more accessible to the masses.

skybrian大约 2 年前

I wonder why anyone would want to run it in a browser, other than to show it could be done? It's not like the extra latency would matter, since these things are slow.Running it on a server you control makes more sense. You can pick appropriate hardware for running the AI. Then access it from any browser you like, including from your phone, and switch devices whenever you like. It won't use up all the CPU/GPU on a portable device and run down your battery.If you want to run the server at home, maybe use something like Tailscale?

评论 #35394656 未加载

jedberg大约 2 年前

With the explosion of LLMs and people figuring out ways to train/use them relatively cheaply, unique data sets will become that much more valuable, and will be the key differentiator between LLMs.Interestingly, it seems like companies that run chat programs where they can read the chats are best suited to building "human conversation" LLMs, but someone who manages large text datasets for others are in the perfect place to "win" the LLM battle.

fswd大约 2 年前

There is somebody finetunin 160m rwkv4 on alpaca on the rwkv discord, I am out of the office and can't link but the person posted in prompt showcase channel

评论 #35392520 未加载

nope96大约 2 年前

I remember watching one of the final episodes of Connections 3: With James Burke, and he casually said we'd have personal assistants that we could talk to (in our PDAs). That was 1997 and I knew enough about computers to think he was being overly optimistic about the speed of progress. Not in our lifetimes. Guess I was wrong!

alecco大约 2 年前

Interesting blog but the extrapolations are way overblown. I tried one of the 30bn models and it's not even remotely close to GPT-3.Don't get me wrong, this is very interesting and I hope more is done in the open models. But let's not over-hype by 10x.

gessha大约 2 年前

We need a DAWNBench* benchmark for training ChatGPT the fastest and cheapest.* <a href="https://dawn.cs.stanford.edu/benchmark/" rel="nofollow">https://dawn.cs.stanford.edu/benchmark/</a>

ushakov大约 2 年前

Now imagine loading 3.9 GB each time you want to interact with a webpage

评论 #35391555 未加载

评论 #35391728 未加载

cavisne大约 2 年前

There is a minimum cluster size to get good utilization of the GPU’s. $1 an hour per chip might get you one A100 but it won’t get you hundreds clustered together.

ChumpGPT大约 2 年前

I'm not so smart and I don't understand a lot about ChatGPT, etc, but could there be a client side app like Folding@home that would allow millions of people to give processing power to train a LLM?

v4dok大约 2 年前

Can someone at the EU, the only player in this thing with no strategy yet just pool together enough resources so the open-source people can train models. We don't ask much, just give compute power

评论 #35394392 未加载

评论 #35395829 未加载

TMWNN大约 2 年前

Hey, that means it can be turned into an Electron app!

ultrablack大约 2 年前

If you could, you should have done it 6 months ago.

评论 #35391536 未加载

rspoerri大约 2 年前

So cool it runs on a browser /sarcasm/ i might not even need a computer. Or internet when we are at it.It either runs locally or it runs on the cloud. Data could come from both locations as well. So it's mostly technically irrelevant if it's displaying in a browser or not.Except when it comes to usability. I don't get it why people love software running in a browser. I often close important tools i have not saved when it's in a browser. I cant have offline tools which work if i am in a tunnel (living in Switzerland this is an issue) . Or it's incompatible because i am running LibreWolf./sorry to be nitpicking on this topic ;-)

评论 #35391515 未加载

评论 #35391700 未加载

评论 #35391626 未加载

评论 #35392229 未加载