I have been thinking about this recently. Many projects are focused on running LLMs and SLMs locally. However, is that just for playing around? Or do you actually want to run the inference locally in your companies?<p>I feel like there could be 2 major advantages: costs at scale and privacy.<p>1. When talking about the cost, GPT-4o-mini is inexpensive and if we continue in that path, the cost for inference will become negligible soon. Unless your company makes huge use of the model (or uses huge contexts), like those running thousands of autonomous agents, investing in the hardware, does not seem like the best alternative.<p>2. Privacy. I would say this is more relevant for some industries that work with highly sensitive data. However, I can see how big companies simply engage in private cloud contracts with Azure or other cloud providers. They provide that peace of mind and scalability and at the same time, depending on the contract, some guarantees.<p>So my big question is, do you know use cases or companies deploying LLMs on their data centers, or looking to do it or is this just for hobbyists?