Some really interesting work lately on "contrastive" learning, where the accuracy is really getting on par with supervised learning, e.g. <a href="https://arxiv.org/abs/2002.05709" rel="nofollow">https://arxiv.org/abs/2002.05709</a>
Followup post on SimCLR: <a href="https://amitness.com/2020/03/illustrated-simclr/" rel="nofollow">https://amitness.com/2020/03/illustrated-simclr/</a>
So instead of image annotation, self-supervised learning performs image manipulation to train a model. Then what? Is this network then piped into the original task at hand which would have required human annotations or is it simply for these made up tasks?