RVL-BERT: Visual Relationship Detection with Visual-Linguistic Knowledge from Pre-trained Representations

09/10/2020
by   Meng-Jiun Chiou, et al.
16

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanism, it is believed that external visual commonsense knowledge is beneficial for reasoning visual relationships of objects in images, which is however rarely considered in existing methods. In this paper, we propose a novel approach named Relational Visual-Linguistic Bidirectional Encoder Representations from Transformers (RVL-BERT), which performs relational reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training with multimodal representations. RVL-BERT also uses an effective spatial module and a novel mask attention module to explicitly capture spatial information among the objects. Moreover, our model decouples object detection from visual relationship recognition by taking in object names directly, enabling it to be used on top of any object detection system. We show through quantitative and qualitative experiments that, with the transferred knowledge and novel modules, RVL-BERT surpasses previous state-of-the-art on two challenging visual relationship detection datasets. The source code will be publicly available soon.

READ FULL TEXT

page 1

page 4

page 5

page 7

research
12/13/2020

KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning

Reasoning is a critical ability towards complete visual understanding. T...
research
09/02/2020

Intrinsic Relationship Reasoning for Small Object Detection

The small objects in images and videos are usually not independent indiv...
research
10/26/2021

HR-RCNN: Hierarchical Relational Reasoning for Object Detection

Incorporating relational reasoning in neural networks for object recogni...
research
10/30/2018

Hybrid Knowledge Routed Modules for Large-scale Object Detection

The dominant object detection approaches treat the recognition of each r...
research
07/05/2018

Detecting Visual Relationships Using Box Attention

In this paper we propose a new model for detecting visual relationships....
research
07/28/2017

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Understanding visual relationships involves identifying the subject, the...
research
06/18/2022

VReBERT: A Simple and Flexible Transformer for Visual Relationship Detection

Visual Relationship Detection (VRD) impels a computer vision model to 's...

Please sign up or login with your details

Forgot password? Click here to reset