MLOps is a mess but that's to be expected

149 pointsby nutellaloverabout 3 years ago

15 comments

a_bonoboabout 3 years ago

As a researcher applying ML in my work the OP reads like me looking from the outside at web-frameworks: there's a LOT of noise but in practice, people run just the same 3 things.For example, when I first got into ML the ADAM optimizer was the new big thing, since then hundreds of 'better' optimizers have been published. This paper from August '21 shows that most of that is overblown and no optimizer consistently outperforms ADAM: <a href="https://arxiv.org/pdf/2007.01547.pdf" rel="nofollow">https://arxiv.org/pdf/2007.01547.pdf</a>I'd even go so far and say that OP's big image of MLOps is misleading. 'Data Science Notebooks' shows Jupyter, binder, colab, I can't see the other logos; but binder and colab both run Jupyter notebooks? Under ML platforms there are a ton of companies in this picture which effectively do the same thing. Some of these logos are tools (Jupyter, R), some of these logos are companies using ML in some way or other (John Deere, Siemens) - and once you go down this path you might as well put any mid-sized company in the world onto this.

评论 #30548647 未加载

评论 #30544377 未加载

Fiahilabout 3 years ago

I work as a Senior Software Engineer and DevOps for a AI consultancy, and I've been dealing with MLOps for the past 2 years.I'm mostly aligned with what the article says, MLOps today is definitively in a frazzled state. However, I disagree on the following points :- Google and al. are not good example to follow for ML deployment best practices. Sure their sophistication is higher, but they also have a lot more staff to handle "the other side" of modelling : Data, Infrastructure, Tooling. They build tools to suit their needs, as big SaaS products holding *bytes of already organized data ready to be used for data science, and it's a really different perspective than a large retailer trying to get better sales forecasts.- Vendors from all sides have really, really bad fitting products for Data Science and ML in general. "Platforms" are trying to profit off MLOps with a commercial product claiming to be the silver bullet to every pain your team has. Three months later, it's just another life-sucking lock-in with a list of tickets to be addressed. We really miss a new "Docker" here. Have a few examples: Databricks ? It's a Spark-as-a-Service platform with the worst APIs you could imagine. Git-for-data vendors ? They don't understand Git nor Data : is a model data or code ? Both ?Finally, ML at Reasonable Scale builds on top of regular software engineering best-practices. If you don't usually store a shell script's output in a repository, then you should do the same for a notebook. Same goes for Idempotency, Reproducibility (of model training), Composability (of pipeline steps), (data) Versioning, etc..

评论 #30539252 未加载

评论 #30542040 未加载

beckingzabout 3 years ago

MLOps is radically overhyped.I've seen two companies in the last year start twisting themselves into circles to worry about 'productionizing' a data science project.The intent is right to focus on end use case and business value, but it feels like the communication is exclusively in terms of overbuilding systems.

评论 #30541697 未加载

评论 #30537659 未加载

评论 #30537985 未加载

discordanceabout 3 years ago

It’s not that big of a deal.1. Collect new data2. Clean data3. Annotate4. Train models and store versions5. Analyze errors/model metrics (and re-train as need be)6. Deploy model/s7. Monitor8. Repeat steps 1 - 7Yes, there are many tools that can help with each the above. Use whatever suits to automate it and make your job easier.

评论 #30537947 未加载

评论 #30539211 未加载

评论 #30538510 未加载

sfvisserabout 3 years ago

Maybe I’m a bit naive, but I’m convinced any great traditional software engineer or devops engineer in combination with a data scientist/ML person should be able to setup the ops pipeline for a ML project. The details and algorithms themselves may look new and exciting, but operationalizing algorithms isn’t a totally new thing.

评论 #30538034 未加载

评论 #30539295 未加载

评论 #30538245 未加载

评论 #30538509 未加载

thenoblesunfishabout 3 years ago

"New fundamental science advances come out every week"? While there's certainly a lot being published, I think that the word "fundamental" is being abused in that sentence.

yandieabout 3 years ago

WhyLabs cofounder here, so my opinions are probably biased.When it comes to MLOPs, data makes it much more complex to handle. Think of it as the curse of dimensionality. Nobody wants to deal with metrics across tens, if not hundreds or thousands of features. In addition, data is often not stored in nice SQL based system with strong schema enforcement, so we see data bugs creeping up all the time. An example is when an upstream API service returns the 9 digit zip code instead of the 5 digit one. This sort of data issues can creep up at many parts of the ML system, especially when you use JSON to pass data around.You can defend against some of these problems with some basic devops monitoring, but when you deal with tons of features this becomes a tedious task. DevOps tools focus on solving problems around code, deployment and systems health. They are not designed to address the curse of dimensionality above, and you sacrifice a lot by trying to reduce data problems into DevOps signals.To be fair, I don’t think we need some fancy algorithms, but I think we need tools that are optimized around the user experience (I.e. removing frictions) and workflows for these data specific problems. There’s a lot to learn and apply from the DevOps world to think about data health, such as logging and collecting telemetry signals.

lysecretabout 3 years ago

Operationalizing ML is hard. But it has nothing to do with the models at all. It is hard because the main use-case (besides Images and Text processing) is feature fusion, you generate a bunch of distinct features about say People and their history and the products they like etc (thinking of a Recommender-System now). However, these are things that usually live in really distinct parts of your DB / your backend. So as ML ops you are now tasked with getting info from all of these places. In a big org, often with different responsible people, security protocols etc....

评论 #30548750 未加载

herdcallabout 3 years ago

I'm not sure there is anything extra messy about MLOps: there are lots of vendors in pretty much any area that has profit potential. If you put all vendors on a chart, it will look messy, but you aren't going to be working with ALL of them (e.g., if you pick Snowflake you likely won't also be working with RedShift, Databricks...). The messy part I guess is the evaluation/selection, but not the integration or learning per se, as this article seems to imply. The article looks like a good reference for what's out there though.

评论 #30541781 未加载

jerpintabout 3 years ago

My frustration so far with mlops is how unnecessarily large containers need to be to serve even the smallest models. Want to serve an mnist pytorch model? It'll likely be a huge dockerfile compared to model size, exceeding the capacity of most free-tier hosts

评论 #30541161 未加载

评论 #30542347 未加载

fn1about 3 years ago

There's a nice article by Martin Fowler about this topic: <a href="https://martinfowler.com/articles/cd4ml.html" rel="nofollow">https://martinfowler.com/articles/cd4ml.html</a>

Havocabout 3 years ago

I don’t think a bigger ecosystem equates to more complexity as directly as the author implies.Ie suddenly having the option of two different databases doesn’t mean using one becomes twice as hard

bigbillheckabout 3 years ago

> You remember that one VC [..] always does some annual review of what’s hot in AI today. [...] So you check out his 2021 reviewIt takes all kinds, I guess.

VMtestabout 3 years ago

since the ML hype is here to stay, it's really good in my opinion because investors are constantly pouring more money into the industry

评论 #30539216 未加载

thedenabout 3 years ago

IMO "MLOps" is a "DevOps" problem, if you break it down fundamentally MLOps's requirements are* Computing resources (CPUs, Memory, Storage, GPUs)* Distributed computing in most cases w/ spark + hadoop stack* Keeping state, which may be required to mutate* Rapid iterationThe ML tooling part of it is an implementation detail, i.e, the software and dependencies required. These are hard problems even with trad deterministic computing. I don't seem to understand why the author seems to think ML engineers or scientists need to know these Ops tooling.For example in this tweet <a href="https://twitter.com/mihail_eric/status/1486750600343822343" rel="nofollow">https://twitter.com/mihail_eric/status/1486750600343822343</a> the author complains that data scientists need to learn kubeflow (they don't), and that it's complicated. Thing is, insofar as scalable architecture diagrams along with all the other security side-requirements it's about as complicated as one would expect, maybe a little too abstract for those that do this for a living. I mean your typical k8s-based SaaS tech stack can reach that complexity, but it's managed complexity about as complex needed for the stakes at play.I don't know if ML folk are in the peak hype cycle arrogance where they think global ops problems can be solved for their use case, or if there's some misunderstanding on the iceberg of a problem of managing infra is.I do agree it is messy, I did some ML Ops (w/a big data stack) as a "DevOps engineer" but I stuck with k8s and infra primitives, filtering out most of the list. The ML aspect was the easy part, mainly managing the install deps, jupyter notebooks state etc., the hard part was scaling to manage costs, managing a big data stack in general, and making the entire flow UX friendly to ML engineers and data scientists, since you can't expect them to learn new cli tools and trad software dev tooling (they're paid too much to waste time not working on ML problems). I think a lot of these problems are solved if your company has a lot of money to burn on SaaS solutions or not care about scaling down, or being able to afford your own datacenter.My counterpoint to the article is that the industry has bent backwards to cater to the ML space, integrating all these tools to existing tech (spark on k8s, kubeflow), making entire pipelines jupyter-driven (<a href="https://netflixtechblog.com/notebook-innovation-591ee3221233" rel="nofollow">https://netflixtechblog.com/notebook-innovation-591ee3221233</a>), and generally using massive amount of resources for ML. The ROI and massive push to burn resources and time into the tooling seems work out for big tech more than anyone.

评论 #30554096 未加载