Interesting to review this post 7 years later in light of Tesla's update yesterday.<p>> I’ve seen some arguments that all we need is lots more data from images, video, maybe text and run some clever learning algorithm: maybe a better objective function, run SGD, maybe anneal the step size, use adagrad, or slap an L1 here and there and everything will just pop out. If we only had a few more tricks up our sleeves! But to me, examples like this illustrate that <i></i>we are missing many crucial pieces of the puzzle and that a central problem will be as much about obtaining the right training data in the right form<i></i> to support these inferences as it will be about making them.<p>Really echoes his answer about how much data they gather (his answer: it's not about <i>how much data</i> we gather, it's about <i>which data</i>).