In the past, I've worked on tons of personal projects where I was using Google's Video Intelligence API (https://cloud.google.com/video-intelligence) to track people, objects, and more after recording stuff live on a Raspberry Pi and sending it up to Google.<p>Works great for personal projects (I get 1k minutes free, etc) but do people actually use these APIs in production? $0.15 / min for a single model feels expensive at scale, and they only work for such a narrow set of use-cases. I know there are other offerings in AWS Sagemaker (https://aws.amazon.com/sagemaker/) that let you build your own models but I don't want to deal with any training data or ML hands-on.<p>I've been wondering if others have dealt with this stuff and/or worked in environments where they feel the same. Is it expected that I have to use OpenCV + some other things like Sagemaker / PyTorch to build image / video understanding systems?