I have worked in industrial machine vision for the last 18 years, and this summary (I didn't read the paper) reflects my comments to management every time they bring up the topic of AI or ML. Our inspection systems operate at anywhere from 300 parts per minute to 3000 parts per minute. AI/ML has way too high of a false reject rate (or even worse, a false accept rate!) The worst scenario I explain to my managers is if we implement a AI/ML system and for some reason it starts rejecting 50% of the customers product at 3am on a Sunday, then there is no practical way to analyze the results to determine a difinitive cause for the reject, and what the corrective action needs to be. The final gut punch is then we would have to tell the customer that it could be several hours to retrain the model(and that's only after we figure out what needs to be represented in the good and bad image sets to account for the new failure mode).
This paper shows measured results of using "popular image recognition services" .. that include Azure, AWS, Google, IBM and other current commercial offerings.. (the implication right away is that the tested services are using some DeepLearning system on the server side). The paper specifically says "from the point of view of a software developer".. and spends quite a bit of effort to question the assumptions of a user of these services, and identify potential pitfalls from a mis-match of user assumptions, including consistency over time, consistency between services themselves, and employing a machine that produces deterministic outcomes versus probabilistic ones. The paper looks at the behavior of a Vision-as-Service use from a Software Quality Assurance (SQA) point of view - is the result -of commercial services on the web- reliable over time. Liability within safety-critical environments is questioned.<p>The comments here (so far) address "does DeepLearning image analysis work" .. which is a broader question than what is being addressed in the paper.. Importantly, other kinds of image analysis methods, including other ML approaches, are not being compared..<p>The authors seem to be raising a bit of an alarm about services like these, reflected in the paper title (weakly):<p>[RH1] Computer vision services do not respond with consistent outputs between services, given the same input image.<p>[RH2] The responses from computer vision services are non-deterministic and evolving, and the same service can change its top-most response over time given the same input image.<p>[RH3] Computer vision services do not effectively communicate this evolution and instability, introducing risk into engineering these systems<p>To a non-specialist, this seems like detailed description of a useful real-world investigation, like a lab. The authors' skepticism is healthy, and the paper overall looks good. On the negative side, the discussion of labels in Computer Vision seems to be insufficiently distinguishing between fundamental problems in taxonomy and classification, problems with data grouping in general, and then specifically problems associated with this kind of DeepLearning image identification.
I feel like you can infer these services are stochastic and not deterministic based on the documentation.<p><a href="https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/getting-started-improving-your-classifier" rel="nofollow">https://docs.microsoft.com/en-us/azure/cognitive-services/cu...</a><p>"The quality of your classifier depends on the amount, quality, and variety of the labeled data you provide it and how balanced the overall dataset is. "
Good analysis for a real-world problem that software developers face. Customer expectations definitely do not align with the realistic capabilities of the technology.