> Machine learning models which can be deployed effortlessly and operate unattended are far more likely to achieve commercial objectives.<p>Likeliness of achieving commercial objectives is tied to the commercial usefulness and accuracy of your analysis and predictions, not the ease of deployment, or-even more curiously-ability to be left unattended.
I really like how they implemented the data catalog [0] so that it’s yaml-based and also has a paths-style cascading method of files that can be common across or within teams as well as personal for individual projects. I think this makes it easy to build up with tools for meta analysis (how many data sets are used, etc) and even viz using a variety of tools rather than having the metadata management tied to a system or product.<p>Are there other techniques for data catalogs that are file based or at least open standard based that scale all the way up from developer?<p>[0] <a href="https://kedro.readthedocs.io/en/latest/04_user_guide/04_data_catalog.html" rel="nofollow">https://kedro.readthedocs.io/en/latest/04_user_guide/04_data...</a>
Conjecture: production quality of ml code has mostly to do with how heuristics are designed and battle tested and almost nothing to do with how the training/inference pipeline is constructed.
tldr, if you really dig past the marketing (from the FAQ (1)):<p>> We see Airflow and Luigi as complementary frameworks: Airflow and Luigi are tools that handle deployment, scheduling, monitoring and alerting. Kedro is the worker that should execute a series of tasks, and report to the Airflow and Luigi managers.<p>> Create the data transformation steps as pure Python functions<p>Personally, I feel mystified why you would use something like this rather than a more mature product like say, Spark, that natively supports clustering, etc, which is what I would really like to see in the FAQ.<p>Is it a processing solution? Not really, since it suggests you can offload the heavy lifting to an engine, eg. spark. An orchestrator? Apparently not, because that's a complementary product. So... it's like, a configuration management tool?<p>Pretty hard to see the use case to me.<p>1. <a href="https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects" rel="nofollow">https://kedro.readthedocs.io/en/latest/06_resources/01_faq.h...</a>
Starting to see a lot of these frameworks pop up to simplify deployment of machine learning models. I’m really hoping one or two start to stand out...but it doesn’t feel like this one.