We put together our own setup for training. It was a pain in many ways, but I would probably do it again.<p>We had six boxes with three gpus each, 24 cores, 64G ram. Was all consumer grade stuff so we got to be our own amateur ops people since the professionals wouldn't touch it, understandably.<p>But it was cheap, and there was something really nice to knowing that the hardware was all there waiting for us to leverage -- rather than thinking we had to get our models and hyperparameters perfect before using up precious per-hour resources in the cloud.<p>Once we were happy with each next model, we'd ship to production where we had a small active/passive setup on real hardware in a real data center and that was enough to easily handle the query load.