ReCo: Retrieve and Co-segment for Zero-shot Transfer

06/14/2022
by   Gyungin Shin, et al.
1

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.

READ FULL TEXT

page 2

page 4

page 9

page 17

research
06/07/2023

UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks

Large-scale joint training of multimodal models, e.g., CLIP, have demons...
research
12/15/2021

Decoupling Zero-Shot Semantic Segmentation

Zero-shot semantic segmentation (ZS3) aims to segment the novel categori...
research
12/29/2021

A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model

Recently, zero-shot image classification by vision-language pre-training...
research
09/22/2022

NamedMask: Distilling Segmenters from Complementary Foundation Models

The goal of this work is to segment and name regions of images without a...
research
08/23/2023

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Producing quality segmentation masks for images is a fundamental problem...
research
04/27/2023

Zero-shot Unsupervised Transfer Instance Segmentation

Segmentation is a core computer vision competency, with applications spa...
research
01/17/2023

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Image-text contrastive learning models such as CLIP have demonstrated st...

Please sign up or login with your details

Forgot password? Click here to reset