If you want to run Mixtral 8x7B locally you can use llama.cpp (including with any of the supporting libraries/interfaces such as text-generation-webui) with <a href="https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-SFT-GGUF" rel="nofollow">https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-S...</a>.<p>The smallest quantized version (2bit) needs 20GB of RAM (which can be offloaded onto the VRAM of a decent 4090 GPU). The 4bit quantized versions are the largest models that can just about fit onto a 32GB system (29GB-31B). The 6bit (41GB) and 8bit (52GB) models need a 64GB system. You would need multiple GPUs with shared memory if you wanted to offload the higher precision models to VRAM.<p>I've experimented with the 7B and 13B models, but haven't experimented with these models yet, nor other larger models.
Kudos to Brave (for this and other privacy features):<p><i>Unlinkable subscription: If you sign up for Leo Premium, you’re issued unlinkable tokens that validate your subscription when using Leo. This means that Brave can never connect your purchase details with your usage of the product, an extra step that ensures your activity is private to you and only you. The email you used to create your account is unlinkable to your day-to-day use of Leo, making this a uniquely private credentialing experience.</i>
Interesting, I must have missed the first Leo announcement. I really like how privacy conscious it is. They don’t store any chat record which is what I want.
It's interesting that they made it so you can ask LLM queries right from the omnibar. I wonder if they eventually will come up with some heuristic to determine if thr query should be sent directly to an LLM or if the query should use the default search provider.
If you have used gpt4 and then use mistral, it’s like looking at a Retina display and then have to go back to a low res screen. You are always thinking “but GPT4 could do this though”
Does anyone know of a good chrome extension for AI page summarization? I tried a bunch of the top Google search hits, they work fine but are really bloated with superfluous features.
quick question I have 24GB VRAM and I need to close everything to run MIXTRAL at 4 bit quant with bitsandbyte. there is no way to run it at 3,5 on windows?
It's nice using Brave because you have Chromium's better performance, without having to worry about Manifest V2 dying and taking adblocking down with it. I have uBlock Origin enabled, but it has barely caught anything that slipped past the browser filters.