IMO "MLOps" is a "DevOps" problem, if you break it down fundamentally MLOps's requirements are<p>* Computing resources (CPUs, Memory, Storage, GPUs)<p>* Distributed computing in most cases w/ spark + hadoop stack<p>* Keeping state, which may be required to mutate<p>* Rapid iteration<p>The ML tooling part of it is an implementation detail, i.e, the software and dependencies required. These are hard problems even with trad deterministic computing. I don't seem to understand why the author seems to think ML engineers or scientists need to know these Ops tooling.<p>For example in this tweet <a href="https://twitter.com/mihail_eric/status/1486750600343822343" rel="nofollow">https://twitter.com/mihail_eric/status/1486750600343822343</a> the author complains that data scientists need to learn kubeflow (they don't), and that it's complicated. Thing is, insofar as scalable architecture diagrams along with all the other security side-requirements it's about as complicated as one would expect, maybe a little too abstract for those that do this for a living. I mean your typical k8s-based SaaS tech stack can reach that complexity, but it's managed complexity about as complex needed for the stakes at play.<p>I don't know if ML folk are in the peak hype cycle arrogance where they think global ops problems can be solved for their use case, or if there's some misunderstanding on the iceberg of a problem of managing infra is.<p>I do agree it is messy, I did some ML Ops (w/a big data stack) as a "DevOps engineer" but I stuck with k8s and infra primitives, filtering out most of the list. The ML aspect was the easy part, mainly managing the install deps, jupyter notebooks state etc., the hard part was scaling to manage costs, managing a big data stack in general, and making the entire flow UX friendly to ML engineers and data scientists, since you can't expect them to learn new cli tools and trad software dev tooling (they're paid too much to waste time not working on ML problems). I think a lot of these problems are solved if your company has a lot of money to burn on SaaS solutions or not care about scaling down, or being able to afford your own datacenter.<p>My counterpoint to the article is that the industry has bent backwards to cater to the ML space, integrating all these tools to existing tech (spark on k8s, kubeflow), making entire pipelines jupyter-driven (<a href="https://netflixtechblog.com/notebook-innovation-591ee3221233" rel="nofollow">https://netflixtechblog.com/notebook-innovation-591ee3221233</a>), and generally using massive amount of resources for ML. The ROI and massive push to burn resources and time into the tooling seems work out for big tech more than anyone.