Too many people focus on "properly" putting ML into production...<p>I'd like to propose an alternative... Build a model (once) on your dev machine. Copy it to S3. Do CPU inference in some microservice. Get the production system to query your microservice, and if it doesn't reply in some (very short) timeout, fallback to whatever behaviour your company was using before ML came along.<p>If the results of yor ML can be saved (eg. a per-customer score), save the output values for each customer and don't even run the ML realtime at all!<p>Don't handle retraining the model. Don't bother with high reliability or failover. Don't page anyone if it breaks.<p>By doing this, you get rid of 80% of the effort required to deploy an ML system, yet still get 80% of the gains. Sure, retraining the model hourly might be optimal, but for most businesses the gains simply don't pay for the complexity and ongoing maintenance.<p>Insider knowledge says some very big companies deploy the above strategy very successfully...