Using external memory instead of encoding all of the knowledge in the model will take over all branches of applied ML.<p>A recognition model should use a similar mechanism to store short term context in a memory buffer from previous frames and a large external database of long term key value pairs that retain relevant semantic information for given embeddings.<p>Doing so will make it possible to update and expand the models without having to retrain and enable much better zero/few shot learning.<p>We already have a hacky version of this in our production app for food recognition. For new users we use a standard CNN to predict the items present in the image, once a user logs a few meals we use nearest neighbor search to match new images against previously submitted entries, which works extremely well.