Note that if you use open source models, it’s orders of magnitude cheaper. In my own tests, an off the shelf face DNN-based face detector [1] ran at 20 FPS on a 16 vCPU machine (Google Cloud). With our hand-rolled distributed execution engine [2], we processed around 40 million images for about $2000, which is 14x cheaper than the cheapest figure cited in OP. I don’t know the accuracy difference, but it sounds like OP will cover this in the next post.<p>[1] <a href="https://github.com/davidsandberg/facenet/tree/master/src/align" rel="nofollow">https://github.com/davidsandberg/facenet/tree/master/src/ali...</a><p>[2] <a href="https://github.com/scanner-research/scanner" rel="nofollow">https://github.com/scanner-research/scanner</a>
This is a marketing misdirection; to compare these companies as if they were leaders in facial recognition is suspicious, nearly fraudulent. They are far from it. These companies are merely brand names laypeople recognize. If one wants to know the leading contenders in facial recognition, formal tests are performed and published by the NTSC every year: <a href="https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc" rel="nofollow">https://www.nist.gov/programs-projects/face-recognition-gran...</a>.<p>Not to mention the article does not address any real world key points of using FR. Their usage rages are a joke; real world scenarios begin at 100K much sooner than a month, and are typically measured with these numbers per minute or hour. Real world usage of FR at the rates of these services breaks banks.
Nice overview. Too bad the test set was rather tiny (33 images). Detection rates in this test:<p><pre><code> Amazon 52.66 %
Google 40.43 %
IBM 39.36 %
Microsoft 17.55 %</code></pre>
I've tinkered with face recognition in the past days, and am now waiting for a response to my GCE quota request for a V100 GPU. $8/h rate for a preemptible instance seems really cheap, yet I am not sure if I will ever be able to process my dataset.<p>Does anyone have a clue on how much it will cost to detect faces and extract 128d encodings from ~100M of 200x200 photos?
I have found Rekognition to be quite good. Maybe the most interesting machine learning service result for me from the past year was IBM being much better than Google at speech to text. Specifically they offered some key features, like speaker attribution, that made their offering standout. I also found their accuracy rates to be very good.<p>When I first started exploring these apis I just assumed Google would be amazing but that is _not the case at all_. I suspect they save their best stuff for their own products and the api solutions always lag a little behind. IBM may be better because their apis are such a core offering. Microsoft's stuff is a clear afterthought and the only Amazon service I've had success with is Rekognition. That said, Transcribe just launched so it may/will improve with time.
<i>Before we compare the different face detection API’s, let's scan the images first by ourselves! How many faces would a human be able to detect?</i><p>This is great! Honestly, most researchers need to start doing this step. Baselining around accessible human capabilities (even if rough) is a super great way to show the benefits or drawbacks of using ML, especially in image processing applications, where it's more directly comparable.
Interesting but to me the most appealing aspect of Face APIs is around detecting identity or emotions. I don’t care often if a face is present - but I want to know things ABOUT the face which this test didn’t even go into.