For those primarily interested in open weight models, that Mixtral 8x22B is really intriguing. The Mistral models have tended to outperform other models with similar parameter counts.<p>Still 281GB is huge. That's at the higher end of what we see from other open weight models, and it's not going to fit on anybody's homelab franken-GPU rig. Assuming that 281GB is fp16, it should quantize down to roughly 70GB at 4bits. Still too big for any consumer grade GPU, but accessible on a workstation with enough system ram. Mixtral 8x7B runs surprisingly fast, even on CPUs. Hopefully this 8x22B model will perform similarly.<p>EDIT:
Available here in GGUF format: <a href="https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF" rel="nofollow">https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF</a><p>The 2-bit quantization comes to 52GB, so worse than my napkin math suggested. Looking forward to giving it a try on my desktop though.
Are any of these stable? I mean when using temperature=0, do you get the same reply for the same prompt?<p>I am using gpt-4-1106-preview quite a lot, but it is hard to optimize prompts when you cannot build a test-suite of questions and correct replies against which you can test and improve the instruction prompt. Even when using temperature=0, gpt-4-1106-preview outputs different answers for the same prompt.
Does Gemini have a prepaid mode?<p>I like that both OpenAI and Anthropic default to the prepaid mode; I can safely experiment without worrying about selecting a large file by mistake (or worse, a runaway automated process).
Cohere’s Command R+ is unimpressive model, because it agrees with me every time I try to argue with smth like: "But are you sure? ..."; also it has: "last update in January 2023".<p>Mixtral 8x22B is interesting because 8x7B was one of the best (among all others) for me few months ago (in particular, common knowledge, engineering and high-level math, multi-lingual skills like translation, grammatically nicer rewritings)
One of the most attractive features about Mistral open models is that you can build a product on top of their API, and switch to a self hosted version if the need arises, such as customer requesting to run onprem due to privacy requirements, or the API service being taken down.
is the point of system prompts just to avoid prompt injection?
or are they supposed to get better outputs too?<p>I never have found a need for them. i.e. the example in the article
Just prompting like:<p>Write hello 3 different ways in spanish
works fine for me