Hey HN!<p>We've had lots of success using quantized LLMs for inference speed and cost because you can fit them on smaller GPUs (Nvidia T4, Nvidia K80, RTX 4070, etc). There's no need for everyone to quantize - we quantized Llama 3 8b Instruct to 8 bits using GPTQ and figured we'd share it with the community. Excited to see what everyone does with it!