The new Llama 3 8b in on par with 22b models, better but could be 10x cheaper than GPT3.5<p>AI builders, if you are using Llama 3 in backend, where do you host it or what API do you use? (For production usecases with good speed and rate limits close to ChatGPT or Claude)<p>-AWS sagemaker<p>-Self host on cloud GPUs<p>-Replicate API (just found them, 0.05/1m token, legit?)<p>-AWS bedrock (seems pricy)<p>-Others - pls comment<p>Any feedback is welcome!