Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

08/17/2022
by   Tao He, et al.
6

Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. The prevailing SGG methods require all object classes to be given in the training set. Such a closed setting limits the practical application of SGG. In this paper, we introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes but is required to infer relations for unseen target object classes. To this end, we propose a two-step method that firstly pre-trains on large amounts of coarse-grained region-caption data and then leverages two prompt-based techniques to finetune the pre-trained model without updating its parameters. Moreover, our method can support inference over completely unseen object classes, which existing methods are incapable of handling. On extensive experiments on three benchmark datasets, Visual Genome, GQA, and Open-Image, our method significantly outperforms recent, strong SGG methods on the setting of Ov-SGG, as well as on the conventional closed SGG.

READ FULL TEXT
research
08/22/2023

Opening the Vocabulary of Egocentric Actions

Human actions in egocentric videos are often hand-object interactions co...
research
03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...
research
05/28/2019

Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

Relations amongst entities play a central role in image understanding. D...
research
11/26/2021

Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

Scene graph generation (SGG) aims to capture a wide variety of interacti...
research
04/25/2021

Learning to Better Segment Objects from Unseen Classes with Unlabeled Videos

The ability to localize and segment objects from unseen classes would op...
research
07/27/2022

Iterative Scene Graph Generation

The task of scene graph generation entails identifying object entities a...
research
06/23/2023

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation

Scene Graph Generation (SGG) aims to structurally and comprehensively re...

Please sign up or login with your details

Forgot password? Click here to reset