WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

06/19/2023
by   Zesen Cheng, et al.
1

The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) provides fine-grained information to top-down branch and introduces prior object information to bottom-up branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weightedly integrates results by sampling confident scores from the distributions. With our WiCo, several prominent top-down and bottom-up combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 11

research
07/02/2022

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

Few-shot fine-grained learning aims to classify a query image into one o...
research
09/28/2022

TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval

Most existing methods in vision-language retrieval match two modalities ...
research
08/19/2022

Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Multi-modal aspect-based sentiment classification (MABSC) is an emerging...
research
12/22/2017

SFCN-OPI: Detection and Fine-grained Classification of Nuclei Using Sibling FCN with Objectness Prior Interaction

Cell nuclei detection and fine-grained classification have been fundamen...
research
05/21/2020

Interpretable and Accurate Fine-grained Recognition via Region Grouping

We present an interpretable deep model for fine-grained visual recogniti...
research
07/06/2022

STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding

In this technical report, we introduce our solution to human-centric spa...
research
06/23/2020

Facing the Hard Problems in FGVC

In fine-grained visual categorization (FGVC), there is a near-singular f...

Please sign up or login with your details

Forgot password? Click here to reset