R^3Net:Relation-embedded Representation Reconstruction Network for Change Captioning

10/20/2021
by   Yunbin Tu, et al.
0

Change captioning is to use a natural language sentence to describe the fine-grained disagreement between two similar images. Viewpoint change is the most typical distractor in this task, because it changes the scale and location of the objects and overwhelms the representation of real change. In this paper, we propose a Relation-embedded Representation Reconstruction Network (R^3Net) to explicitly distinguish the real change from the large amount of clutter and irrelevant changes. Specifically, a relation-embedded module is first devised to explore potential changed objects in the large amount of clutter. Then, based on the semantic similarities of corresponding locations in the two images, a representation reconstruction module (RRM) is designed to learn the reconstruction representation and further model the difference representation. Besides, we introduce a syntactic skeleton predictor (SSP) to enhance the semantic interaction between change localization and caption generation. Extensive experiments show that the proposed method achieves the state-of-the-art results on two public datasets.

READ FULL TEXT

page 1

page 3

page 8

page 9

research
09/30/2020

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Change Captioning is a task that aims to describe the difference between...
research
03/06/2023

Neighborhood Contrastive Transformer for Change Captioning

Change captioning is to describe the semantic change between a pair of s...
research
01/08/2019

Viewpoint Invariant Change Captioning

The ability to detect that something has changed in an environment is va...
research
09/15/2023

Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding

Recently, the development of pre-trained vision language foundation mode...
research
03/25/2021

Describing and Localizing Multiple Changes with Transformers

Change captioning tasks aim to detect changes in image pairs observed be...
research
12/02/2022

SARAS-Net: Scale and Relation Aware Siamese Network for Change Detection

Change detection (CD) aims to find the difference between two images at ...
research
06/29/2023

Classifying Crime Types using Judgment Documents from Social Media

The task of determining crime types based on criminal behavior facts has...

Please sign up or login with your details

Forgot password? Click here to reset