"a mouth without a brain" analogy is good one. Current NLP is impressive but there are limits.<p>People have spatiotemporal model of the world, different physical models, social and behavioral models of the world, organizational model of the society, economic model, etc. Humans parse the language and transform it into multiple models of the world where many indented meanings and semantics are self-evident and it becomes "a common sense". They have crude understanding of how fabrics, paper, gas, liquid, rubber, iron, rock, etc. behave and they understand written text based on this more complete model zoo.<p>There is similar limit in computer vision. Humans reason about 2d images using
internal 3d model. Even if they see a completely new object shape, they can usually infer what the other side of the object looks like using basic symmetries and physical models.<p>Image understanding must eventually transform into spatiotemporal + physical model and there are several approaches underway. NLP has much harder problem, because the problem is more abstract and complex.