Self-Supervised One-Shot Learning for Automatic Segmentation of StyleGAN Images
We propose in this paper a framework for automatic one-shot segmentation of synthetic images generated using StyleGANs. As to the need for `one-shot segmentation', we want the network to carry out a semantic segmentation of the images on the fly, that is, as they are being produced at inference time. The implementation of our framework is based on the observation that the multi-scale hidden features produced by a GAN during image synthesis hold useful semantic information that can be utilized for automatic segmentation. Using these features, our proposed framework learns to segment synthetic images using a novel self-supervised, contrastive clustering algorithm that projects the hidden features in the generator onto a compact feature space for per-pixel classification. This contrastive learner uses a swapped prediction loss for image segmentation that is computed using pixel-wise cluster assignments for the image and its transformed variants. Using the hidden features from an already pre-trained GAN for clustering, this leads to a much faster learning of the pixel-wise feature vectors for one-shot segmentation. We have tested our implementation on a number of standard benchmarks (CelebA, LSUN, PASCAL-Part) for object and part segmentation. The results of our experiments yield a segmentation performance that not only outperforms the semi-supervised baseline methods with an average wIoU margin of 1.02 speeds by a peak factor of 4.5. Finally, we also show the results of using the proposed framework in the implementation of BagGAN, a GAN-based framework for the production of annotated synthetic baggage X-ray scans for threat detection. This one-shot learning framework was trained and tested on the PIDRay baggage screening benchmark for 5 different threat categories to yield a segmentation performance which stands close to its baseline segmenter.
READ FULL TEXT