I'm confused. The demo doesn't seem to be using CNNs, then the article mentions just using SolvePnP instead for the 3D case (which is not ML -- it's an overdetermined linear system solver). Wouldn't it be possible to map points on the hand into a prototypical hand scale-invariant reference frame in 3D space? We also have newer mobile devices with stereoscopic cameras.<p>Now I'm also more curious about running CNNs on mobile devices (seems like something that can just be done in a shader).