Ask HN: What's the best self hosted/local alternative to GPT-4?

328 点作者 surrTurr将近 2 年前

Constant outages and the model seemingly getting nerfed[^1] are driving me insane. Which viable alternatives to GPT-4 exist? Preferably self-hosted (I'm okay with paying for it) and with an API that's compatible with the OpenAI API.[^1]: https://news.ycombinator.com/item?id=36134249

44 条评论

wokwokwok将近 2 年前

There is literally no alternative.You’re stuck with openai, and you’re stuck with whatever rules, limitations or changes they give you.There are other models, but specifically if you’re actively using gpt-4 and find gpt-3.5 to be below the quality you require…Too bad. You’re out of luck.Wait for better open source models or wait patiently for someone to release a meaningful competitor, or wait for openai to release a better version.That’s it. Right now, there’s no one else letting people have access to their models which are equivalent to gpt-4.

评论 #36139080 未加载

评论 #36139656 未加载

评论 #36138788 未加载

评论 #36139961 未加载

评论 #36138699 未加载

评论 #36138681 未加载

评论 #36138956 未加载

jonathan-adly将近 2 年前

I don't know the licensing and all that jazz (even if you self-host for your personal use it shouldn't matter). But, this paper[0] released a week ago claims " 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU" (QLORA).A quick test of the huggingface demo gives reasonable results[1]. The actual model behind the space is here[2], and should be self-hostable with reasonable effort.0. <a href="https://arxiv.org/abs/2305.14314" rel="nofollow">https://arxiv.org/abs/2305.14314</a> 1. <a href="https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi" rel="nofollow">https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi</a> 2. <a href="https://huggingface.co/timdettmers/guanaco-33b-merged" rel="nofollow">https://huggingface.co/timdettmers/guanaco-33b-merged</a>

评论 #36322095 未加载

评论 #36138827 未加载

评论 #36138639 未加载

评论 #36140868 未加载

评论 #36138632 未加载

TradingPlaces将近 2 年前

As people note, you cannot substitute locally for the Azure GPU cloud that GPT-4 runs on. But I believe that will change, and maybe quickly. After years of explosive exponential growth in model size, all of a sudden, small is beautiful.The precipitating factor is that running large models for research is very expensive, but pales in comparison to putting these things into production. Expenses rise exponentially with model size. Everyone is looking for ways to make the models smaller and run at the edge. I will note that PaLM 2 is smaller than PaLM, the first time I can remember something like that happening. The smallest version of PaLM 2 can run at the edge. Small is beautiful.

weystrom将近 2 年前

<a href="https://github.com/oobabooga/text-generation-webui/">https://github.com/oobabooga/text-generation-webui/</a>Works on all platforms, but runs much better on Linux.Running this in Docker on my 2080Ti, can barely fit 13B-4bit models into 11G of VRAM, but it works fine, produces around 10-15 tokens/second most of the time. It also has an API, that you can use with something like LangChain.Supports multiple ways to run the models, purely with CUDA (I think AMD support is coming too) or on CPU with llama.cpp (also possible to offload part of the model to GPU VRAM, but the performance is still nowhere near CUDA).Don't expect open-source models to perform as well as ChatGPT though, they're still pretty limited in comparison. Good place to get the models is TheBloke's page - <a href="https://huggingface.co/TheBloke" rel="nofollow">https://huggingface.co/TheBloke</a>. Tom converts popular LLM builds into multiple formats that you can use with textgen and he's a pillar of local LLM community.I'm still learning how to fine-tune/train LoRAs, it's pretty finicky, but promising, I'd like to be able to feed personal data into the model and have it reliably answer questions.In my opinion, these developments are way more exciting than whatever OpenAI is doing. No way I'm pushing my chatlogs into some corp datacenter, but running locally and storing checkpoints safely would achieve my end-goal of having it "impersonate" myself on the web.

davepeck将近 2 年前

There are no viable self-hostable alternatives to GPT-4 or even to GPT3.5.The “best” self-hostable model is a moving target. As of this writing it’s probably one of Vicuña 13B, Wizard 30B, or maybe Guanaco 65B. I’d like to say that Guanaco is wildly better than Vicuña, what with its 5x larger size. But… that seems very task dependent.As anecdata: my experience is that none of these is as good as even GPT3.5 for summarization, extraction, sentiment analysis, or assistance with writing code. Figuring out how to run them is painful. The speed at which their unquantized variants run on any hardware I have access to is painful. Sorting through licensing is… also painful.And again: they’re nowhere close to GPT-4.

amilios将近 2 年前

How much GPU memory do you have access to? If you can run it, Guanaco-65B is probably as close as you can get in terms of something publicly available. <a href="https://github.com/artidoro/qlora">https://github.com/artidoro/qlora</a>. But as other comments mention, it's still noticeably worse in my experience.

评论 #36138694 未加载

评论 #36138833 未加载

DebtDeflation将近 2 年前

LLM Leaderboard:<a href="https://chat.lmsys.org/?leaderboard" rel="nofollow">https://chat.lmsys.org/?leaderboard</a>The short answer is that nothing self hosted can come close to GPT-4. The only thing that comes close period is Anthropic's Claude.

deet将近 2 年前

In our experimentation, we've found that it really depends what you're looking for. That is you really need to break down down evaluation by task. Local models don't have the power yet to just "do it all well" like GPT4.There are open source models that are fine tuned for different tasks, and if you're able to pick a specific model for a specific use case you'll get better results.---For example, for chat there are models like `mpt-7b-chat` or `GPT4All-13B-snoozy` or `vicuna` that do okay for chat, but are not great at reasoning or code.Other models are designed for just direct instruction following, but are worse at chat `mpt-7b-instruct`Meanwhile, there are models designed for code completion like from replit and HuggingFace (`starcoder`) that do decently for programming but not other tasks.---For UI the easiest way to get a feel for quality of each of the models (or, chat models at least) is probably <a href="https://gpt4all.io/" rel="nofollow">https://gpt4all.io/</a>.And as others have mentioned, for providing an API that's compatible with OpenAI, <a href="https://github.com/go-skynet/LocalAI">https://github.com/go-skynet/LocalAI</a> seems to be the frontrunner at the moment.---For the project I'm working on (in bio) we're currently struggling with this problem too since we want a nice UI, good performance, and the ability for people to keep their data local.So at least for the moment, there's no single drop-in replacement for all tasks. But things are changing every week and every day, and I believe that open-source and local can be competitive in the end.

simonw将近 2 年前

The answer to this question changes every week.For compatibility with the OpenAI API one project to consider is <a href="https://github.com/go-skynet/LocalAI">https://github.com/go-skynet/LocalAI</a>None of the open models are close to GPT-4 yet, but some of the LLaMA derivatives feel similar to GPT3.5.Licenses are a big question though: if you want something you can use for commercial purposes your options are much more limited.

评论 #36139342 未加载

Gijs4g将近 2 年前

> Preferably self-hosted (I'm okay with paying for it)I'm the founder of Mirage Studio and we created <a href="https://www.mirage-studio.io/private_chatgpt" rel="nofollow">https://www.mirage-studio.io/private_chatgpt</a>. A privacy-first ChatGPT alternative that can be hosted on-premise or on a leading EU cloud provider.

评论 #36138855 未加载

cypress66将近 2 年前

Nothing self hosted is even remotely close to gpt 3.5, let alone gpt4.Wizardlm-uncensored-30B is fun to play with.

评论 #36149693 未加载

MacsHeadroom将近 2 年前

Guanaco-65B[0] using Basaran[1] for your OpenAI compatible API.(You can use any ChatGPT front-end which lets you change the OpenAI endpoint URL.)[0] <a href="https://huggingface.co/TheBloke/guanaco-65B-HF" rel="nofollow">https://huggingface.co/TheBloke/guanaco-65B-HF</a> A QLoRA finetune of LLaMA-65B by Tim Dettmers from the paper here: <a href="https://arxiv.org/abs/2305.14314" rel="nofollow">https://arxiv.org/abs/2305.14314</a>[1] <a href="https://github.com/hyperonym/basaran">https://github.com/hyperonym/basaran</a>

zorrobyte将近 2 年前

What's the best self hosted for ingesting a local codebase and wiki to ask questions of it? Some of the projects linked here have ingest scripts for doc, pdf files; but it'd be cool to ingest a whole git repo and wiki, have a little chat interface to ask questions about the code.

sgd99将近 2 年前

Not self-hosted/local but Claude by Anthropic from what I've heard is really good but the API is not publicly available. It's apparently accessible via Poe (<a href="https://poe.com" rel="nofollow">https://poe.com</a>)As for open models, HuggingFace has a nice leaderboard to see which ones are decent: <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard" rel="nofollow">https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...</a>

ijk将近 2 年前

"Okay with paying for it" gives you a wide range of options.Most of the open source stuff people are talking about is things like running a quantized 33B parameter LLaMA model on a 3090. That can be done on consumer hardware, but isn't quite as good at general purpose queries as GPT-4. Depending on your use case and your ability to fine tune it, that might be sufficient for a number of applications. Partcularly if you've got a very specific task.However, if you're willing to spend, there are bigger models available (e.g. Falcon 40B, LLaMA 65B) that can be run on data server class machines, if you're willing to spend $15-20K.Will that get you GPT-4 level inference? Probably not (though it is difficult to quantify); will it get you a high-quality model that can be further fine-tuned on your own data? Yes.For the smaller models, the fine-tunes for various tasks can be fairly effective; in a few more weeks I expect that they'll have continued to improve significantly. There's new capabilities being added every week.The biggest weakness that's been highlighted in research is that the open source models aren't as good at the wide range of tasks that OpenAI's RLHF has covered; that's partly a data issue and partly a training issue.

f0e4c2f7将近 2 年前

Nothing open source is quite as good as GPT-4 yet but the community continues to edge closer.For general use Falcon seems to be the current best:<a href="https://huggingface.co/tiiuae" rel="nofollow">https://huggingface.co/tiiuae</a>For code specifically Replit's model seems to be the best:<a href="https://huggingface.co/replit/replit-code-v1-3b" rel="nofollow">https://huggingface.co/replit/replit-code-v1-3b</a>

CSSer将近 2 年前

There is a model that was just released called falcon-40B that is available for commercial user. It outperforms every other open LLM model available today. Buyer beware, however, because the license is custom[1] and has restrictions for "attributable revenues" over $1M/year. I'll leave that for you to interpret as you will.[0]: <a href="https://huggingface.co/tiiuae/falcon-40b-instruct" rel="nofollow">https://huggingface.co/tiiuae/falcon-40b-instruct</a> [1]: <a href="https://huggingface.co/tiiuae/falcon-40b-instruct/blob/main/LICENSE.txt" rel="nofollow">https://huggingface.co/tiiuae/falcon-40b-instruct/blob/main/...</a>EDIT: I just realized you seem to be asking for a fully realized, turn-key commercial solution. Yeah, refer to others who say there's no alternative. It's true. Something like this gives you a lot more power and flexibility, but at the cost of a lot more work building the solution as you try to apply it.

评论 #36153435 未加载

评论 #36139515 未加载

captainmuon将近 2 年前

I think you have to distinguish between self-hosted to run on CPU (like LLAMA), on consumer GPU or on big GPUs. I find the market currently very confusing.I'm especially interested since the data center I'm working for is sitting on a bunch of A100 and I get daily requests of people asking for LLMs tuned to specific cases, who can't or won't use OpenAI for various reasons.

anotheryou将近 2 年前

Here you can try vicunia (and quite a few others) easily <a href="https://chat.lmsys.org/" rel="nofollow">https://chat.lmsys.org/</a>They also have A/B testing with a leaderboard where vicunia wins for the self-hostable ones: <a href="https://chat.lmsys.org/?leaderboard" rel="nofollow">https://chat.lmsys.org/?leaderboard</a>

nabakin将近 2 年前

I would monitor and research each of these top models to determine which best fits your use case.<a href="https://lmsys.org/blog/2023-05-25-leaderboard/" rel="nofollow">https://lmsys.org/blog/2023-05-25-leaderboard/</a><a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard" rel="nofollow">https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...</a><a href="https://assets-global.website-files.com/61fd4eb76a8d78bc0676b47d/64547b623e779885728099ec_image5.png" rel="nofollow">https://assets-global.website-files.com/61fd4eb76a8d78bc0676...</a><a href="https://www.mosaicml.com/blog/mpt-7b" rel="nofollow">https://www.mosaicml.com/blog/mpt-7b</a>Also keep up to date with r/LocalLLaMA where new best open models are posted all the time.

kertoip_1将近 2 年前

You can check out this leaderboard to see a current state of LLM alternatives to GPT4<a href="https://lmsys.org/blog/2023-05-25-leaderboard/" rel="nofollow">https://lmsys.org/blog/2023-05-25-leaderboard/</a>But unfortunately for now it seems there aren't any viable self-hosted options...

AndroTux将近 2 年前

<a href="https://gpt4all.io/" rel="nofollow">https://gpt4all.io/</a> works fairly well on my 16 GB M1 Pro MacBook. It's certainly not on a level with ChatGPT, but what is?It's a simple app download and allows you to select from multiple available models. No hacking required.

评论 #36138781 未加载

评论 #36138590 未加载

samwillis将近 2 年前

If you want/need to go cpu only then llama.cpp, and the assorted front ends people are building for it, is looking like a good project: <a href="https://github.com/ggerganov/llama.cpp">https://github.com/ggerganov/llama.cpp</a>

评论 #36138426 未加载

评论 #36138471 未加载

Veen将近 2 年前

It depends what you mean by "viable alternatives" and how much money you are prepared to spend on hardware to self-host. As others have mentioned, you can try llama.cpp and LocalAI, but for most ChatGPT-like applications, you won't get anything like as good results. I've found that using GPT-4 via the OpenAI API is somewhat more reliable than ChatGPT, either via the Playground or via a local chat interface like <a href="https://github.com/mckaywrigley/chatbot-ui">https://github.com/mckaywrigley/chatbot-ui</a>

RecycledEle将近 2 年前

I often worry about aa "The Machine Stops" scenario.GPT AI actually gives me hope. What if we can store and run an AI in a phone-sized-device that is superior to a similarly sized library of books? Can we have a rugged, solar-powered device that could survive the fall of Civilization and help us rebuild?It would certainly have military applications in a warfare. Imagine being the 21ct century equivalent of a 1940's US Marine on Guadal Canal who need to know some survival skills. ChatGPT-on-a-phone would be handy if you could keep the battery charged.

评论 #36144245 未加载

0xbadc0de5将近 2 年前

I'll +1 the votes for Guanaco and Vicuna running with the Oobabooga text-generation-webui.With a 4090, you can get ChatGPT 3.5 level results from Guanaco 33B. Vicuna 13B is a solid performer on more resource-constrained systems.I'd urge the naysayers who tried the OPT and LLaMA models only to give up to note that the the LLM field is moving very quickly - the current set of models are already vastly superior to the LLaMA models from just two months ago. And there is no sign the progress is slowing - in fact, it seems to be accelerating.

vs4vijay将近 2 年前

You can find more details here - <a href="https://old.reddit.com/r/LocalGPT/" rel="nofollow">https://old.reddit.com/r/LocalGPT/</a>

colesantiago将近 2 年前

The best self hosted/local alternative to GPT-4 is a (self hosted) GPT-X variant by OpenAI.No kidding, and I am calling it on the record right here.OpenAI will release an 'open source' model to try and recoup their moat in the self hosted / local space.<a href="https://www.theinformation.com/briefings/openai-readies-new-open-source-ai-model" rel="nofollow">https://www.theinformation.com/briefings/openai-readies-new-...</a>

评论 #36139597 未加载

ludovicianul将近 2 年前

This is a good candidate: <a href="https://github.com/imartinez/privateGPT">https://github.com/imartinez/privateGPT</a>

meroes将近 2 年前

This is like an artist getting used to Adobe’s products before they’re put behind a wall. And borrowing HN’s attitude to that, you apparently deserve it

FieryTransition将近 2 年前

You can fine tune a open source model for your task and achieve better results, at least, instead of just using them directly. But they are still not close to the openai models in generality. Huggingface is the place for exploring models, recently went through a lot of them for my use case, and they are simply not good enough, yet.

born-jre将近 2 年前

There is so much parallel progress happening left and right at the same time they are not there yet. When things like sparseGPT and models fine-tuned with data with tool ability (not just instruct data) may be soon we get there, as long as there is progress i am hopeful. Some sort of inference optimized hardware would also help.

danpalmer将近 2 年前

> Preferably self-hosted (I'm okay with paying for it)The big models, if even available, need >100GB of graphics memory to run and would likely take minutes to warm up.The pricing available via OpenAI/GCP/etc is only effective when you can multi-tenant many users. The cost to run one of these systems for private use would be ~$250k per year.

评论 #36138504 未加载

anon291将近 2 年前

I admittedly haven't used GPT-4 yet, but I've replaced several uses of GPT-3 with RWKV on the Raven dataset. I can load it onto my RTX 2060 with 12GB of mem (quantized of course), and use it to whittle down or summarize data for GPT.

MagicMoonlight将近 2 年前

OpenAssistant is pretty good. It still has some censorship but nowhere near the levels of commercial models.It’s actually impressive how good it is considering the limited resources they have.

paulus-saulus将近 2 年前

<a href="https://huggingface.co/tiiuae/falcon-7b" rel="nofollow">https://huggingface.co/tiiuae/falcon-7b</a>

cl42将近 2 年前

Have you tried using GPT-4 via Azure? My understanding is that it's faster and more reliable.

airgapstopgap将近 2 年前

There really do not exist any alternatives, self-hosted or not. But more importantly, there may never be, what with the rising tide of AI risks and regulations discourse. It seems that soon training and opensourcing or otherwise making accessible a model of that class will be impossible, even as the cost of its production falls.

评论 #36138823 未加载

leros将近 2 年前

Is anyone using a self hosted thing to assist with parsing?

0xferruccio将近 2 年前

Buy a tinybox from tiny corp <a href="https://tinygrad.org/" rel="nofollow">https://tinygrad.org/</a>

Saruto将近 2 年前

Falcon 40B

Marlon1788将近 2 年前

openai not so open. should rebrand to closedai

boringuser2将近 2 年前

I've gone down this rabbit hole and I want to reaffirm what the other commenters are saying: even if you use a massive model and have the compute to back it up at a reasonable pace (you likely don't), it sucks, can't even hold a candle to GPT 3.5

Y_Y将近 2 年前

You could hire a human to manually respond to the queries

评论 #36138477 未加载

评论 #36139128 未加载

评论 #36138463 未加载