Contrastive Video-Language Segmentation

09/29/2021
by   Chen Liang, et al.
0

We focus on the problem of segmenting a certain object referred by a natural language sentence in video content, at the core of formulating a pinpoint vision-language relation. While existing attempts mainly construct such relation in an implicit way, i.e., grid-level multi-modal feature fusion, it has been proven problematic to distinguish semantically similar objects under this paradigm. In this work, we propose to interwind the visual and linguistic modalities in an explicit way via the contrastive learning objective, which directly aligns the referred object and the language description and separates the unreferred content apart across frames. Moreover, to remedy for the degradation problem, we present two complementary hard instance mining strategies, i.e., Language-relevant Channel Filter and Relative Hard Instance Construction. They encourage the network to exclude visual-distinguishable feature and to focus on easy-confused objects during the contrastive training. Extensive experiments on two benchmarks, i.e., A2D Sentences and J-HMDB Sentences, quantitatively demonstrate the state-of-the-arts performance of our method and qualitatively show the more accurate distinguishment between semantically similar objects over baselines.

READ FULL TEXT

page 1

page 3

page 4

page 7

research
12/27/2022

Position-Aware Contrastive Alignment for Referring Image Segmentation

Referring image segmentation aims to segment the target object described...
research
09/20/2022

Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings

Semantic representation learning for sentences is an important and well-...
research
03/19/2021

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

Text-based video segmentation is a challenging task that segments out th...
research
05/04/2022

HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction

Unsupervised relation extraction aims to extract the relationship betwee...
research
06/06/2023

BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs

In-Batch contrastive learning is a state-of-the-art self-supervised meth...
research
06/26/2022

VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

In this paper, we leverage the human perceiving process, that involves v...
research
09/30/2021

Focused Contrastive Training for Test-based Constituency Analysis

We propose a scheme for self-training of grammaticality models for const...

Please sign up or login with your details

Forgot password? Click here to reset