Self-EMD: Self-Supervised Object Detection without ImageNet
In this paper, we propose a novel self-supervised representation learning method, Self-EMD, for object detection. Our method directly trained on unlabeled non-iconic image dataset like COCO, instead of commonly used iconic-object image dataset like ImageNet. We keep the convolutional feature maps as the image embedding to preserve spatial structures and adopt Earth Mover's Distance (EMD) to compute the similarity between two embeddings. Our Faster R-CNN (ResNet50-FPN) baseline achieves 39.8 par with the state of the art self-supervised methods pre-trained on ImageNet. More importantly, it can be further improved to 40.4 images, showing its great potential for leveraging more easily obtained unlabeled data. Code will be made available.
READ FULL TEXT