Where do I begin studying if my goal is to run LLMs locally or in a private cloud with solid software engineering practices? I am unsure of where to even begin.<p>Let's say I want to:<p>0.) Find the locally run LLMs and identify which are applicable.
1.) Containerize the LLMs
2.) Use source control to capture changes to the LLM. Versioning output.
3.) Develop repeatable pipelines driven by APIs for sending data to it.
4.) Prompt engineering.
5.) Best ways to use langchain (or others) to make the system data-aware and agentic.<p>Any thoughts are appreciated.
A lot of it will revolve around Nvidia hardware that you either own or rent. I've built CPU-accelerated AI bots on free VPSes before, but it's slow and not a reflection of best-practices nor state-of-the-art inferencing. Right now, a lot of the meaningful "private cloud" AI stuff is built with extremely proprietary runtimes.