Hi HN - We're Ping-Lin and Xiaofei from Instill AI (<a href="https://www.instill.tech" rel="nofollow">https://www.instill.tech</a>). We're building VDP (<a href="https://github.com/instill-ai/vdp" rel="nofollow">https://github.com/instill-ai/vdp</a>), an open-source ETL tool for unstructured visual data.<p>When people say they are data-driven, most of the time it means they are driven by structured data. I will cut the part where we cite reports claiming that 80% of data are unstructured. The reality is unstructured data are more difficult to analyse and not a lot of companies know or have the resources to deal with them.<p>Before starting Instill AI, we were in a smart video startup dealing with large volumes of visual data every day. Back then, the concept of MLOps was pretty new (2014), every ML company was exploring and building its own stack. We built a battle-proven Vision AI system in-house and had the system running in production for years.<p>What we have learnt from the journey are: 1) buy vs. build: unaffordable high inference cost was the main barrier keeping us from adopting an off-the-shelf solution like Google Vertex AI or Amazon SageMaker, so we went for the "build" route. The truth was the resources we spent on building and maintaining the system were unexpectedly huge, time and money-wise; 2) the Vision AI system we built can actually be modularised and generalised to apply to other industry sectors.<p>We reckon what we had experienced can be a common phenomenon in the industry, and we can help solve the problem. That's why we decide to build VDP, an <i>open-source, general and modularised ETL infrastructure for unstructured visual data</i> for a broader community.<p>Many brilliant MLOps platforms/tools providing computer vision solutions have emerged in the last few years. Most of the tools are built from a model-centric perspective and fall into the following categories:<p>- general ML platforms for model training, experiment tracking, model deployment, etc.<p>- platforms that serve a specific vertical, such as E-commerce, and manufacturing.<p>- platforms that focus on a single component of MLOps, such as data labelling, dataset preparation, and model serving.<p>VDP is built from a <i>data-driven</i> perspective. Although the computer vision model is the most critical component in a visual data ETL pipeline, the ultimate goal of VDP is to streamline the end-to-end visual data flow, with the transform component being able to flexibly import computer vision models from different sources.<p>Today, the early version of VDP supports 2 sources and all Airbyte destination connectors, and it can import computer vision models from various sources including Local, GitHub, DVC, ArtiVC and Hugging Face. Setting up a VDP pipeline is fairly easy via its low-code API and no-code Console. Please take a look at the tutorial: <a href="https://www.instill.tech/docs/tutorials/build-an-async-det-pipeline" rel="nofollow">https://www.instill.tech/docs/tutorials/build-an-async-det-p...</a>.<p>VDP can run locally with Docker Compose. We're working on integrating with Kubernetes and a fully managed version in Instill Cloud.<p>We aim to build VDP as the single point of visual data integration, so users can sync visual data from anywhere into centralised warehouses or applications and focus on gaining insights across all data sources, just like how the modern data stack handles structured data.<p>Operation-wise, VDP resources will be managed in a declarative way to fuse them better with the modern cloud-native context. The API-first and microservice design has opened all sorts of possibilities for VDP.<p>Thanks for reading HN! We are first-time open-source project maintainers. There are definitely lots to learn! Let us know what you think in the comments.<p>VDP links:<p>[1] GitHub: <a href="https://github.com/instill-ai/vdp" rel="nofollow">https://github.com/instill-ai/vdp</a><p>[2] Documentation: <a href="https://www.instill.tech/docs" rel="nofollow">https://www.instill.tech/docs</a><p>[3] Demo: <a href="https://demo.instill.tech" rel="nofollow">https://demo.instill.tech</a>