Learning Representations by Maximizing Mutual Information Across Views
We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, a context could be an image from ImageNet, and multiple views of the context could be generated by repeatedly applying data augmentation to the image. Following this approach, we develop a new model which maximizes mutual information between features extracted at multiple scales from independently-augmented copies of each input. Our model significantly outperforms prior work on the tasks we consider. Most notably, it achieves over 60 protocol. This improves on prior results by over 4 using the representations learned on ImageNet, our model achieves 50 This improves on prior results by 2 use mixture-based representations, segmentation behaviour emerges as a natural side-effect.
READ FULL TEXT