Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

03/23/2023
by   Qifan Yu, et al.
0

Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly to train and hard to distinguish due to a small amount of annotated data compared to frequent predicates. Existing re-balancing strategies try to haddle it via prior rules but are still confined to pre-defined conditions, which are not scalable for various models and datasets. In this paper, we propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates in a low-resource way. The proposed CaCao can be applied in a plug-and-play fashion and automatically strengthen existing SGG to tackle the long-tailed problem. Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner. Comprehensive experiments on three benchmark datasets show that CaCao consistently boosts the performance of multiple scene graph generation models in a model-agnostic way. Moreover, our Epic achieves competitive performance on open-world predicate prediction.

READ FULL TEXT

page 3

page 8

page 17

page 21

research
03/22/2022

Fine-Grained Scene Graph Generation with Data Transfer

Scene graph generation (SGG) aims to extract (subject, predicate, object...
research
04/06/2022

Fine-Grained Predicates Learning for Scene Graph Generation

The performance of current Scene Graph Generation models is severely ham...
research
07/11/2022

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

The performance of current Scene Graph Generation (SGG) models is severe...
research
06/03/2021

AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

Conceptual graphs, which is a particular type of Knowledge Graphs, play ...
research
03/08/2023

CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP

Training a 3D scene understanding model requires complicated human annot...
research
07/11/2020

Generative Graph Perturbations for Scene Graph Prediction

Inferring objects and their relationships from an image is useful in man...
research
05/17/2020

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

Scene graph generation (SGG) aims to predict graph-structured descriptio...

Please sign up or login with your details

Forgot password? Click here to reset