General go to is pair of nvidia cards with 24gb of ram a piece. Should be enough to run stuff like Mixtral 8x7b in 8 bit precision which is good enough. That being said, single 24gb card is fine enough 4bit precision models if you are using it for basic coding assistance.<p>If you are interested in inference only, not training, its not really worth it to invest in cards. Use the online inference tools. And for training, even a pair of 4090s aren't going to be that good without a good CPU and lots of RAM to keep the cards fed as much as possible.
You typically want as much VRAM as possible for this type of application.<p>For example, Llama has versions that take 32GB of VRAM, even after quantization (compression):<p><a href="https://old.reddit.com/r/LocalLLaMA/comments/1806ksz/information_on_vram_usage_of_llm_model/ka72kgc/" rel="nofollow">https://old.reddit.com/r/LocalLLaMA/comments/1806ksz/informa...</a><p>There are smaller versions too however if you're VRAM constrained.
4090 or 3090 are reasonable choices. You want the GPUs with a lot of VRAM. Seen a lot of people running 2-3x used 3090s for reasonable-ish prices (ie <2k usd). If you want higher speed go for 4090s if you can afford it and your computer can handle the wattage (though you can always limit wattage on your gpus to reduce power draw for a fairly minimal speed hit).