Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

07/11/2022
by   Xinyu Lyu, et al.
0

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., woman-on/standing on/walking on-beach. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in keeping with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A), which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175 Recall@100, achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method.

READ FULL TEXT

page 1

page 2

page 4

page 11

page 15

page 17

research
04/06/2022

Fine-Grained Predicates Learning for Scene Graph Generation

The performance of current Scene Graph Generation models is severely ham...
research
07/16/2022

Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

The current studies of Scene Graph Generation (SGG) focus on solving the...
research
06/15/2023

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

Current Vision and Language Models (VLMs) demonstrate strong performance...
research
03/23/2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Scene Graph Generation (SGG) aims to extract <subject, predicate, object...
research
03/07/2020

Adaptive Offline Quintuplet Loss for Image-Text Matching

Existing image-text matching approaches typically leverage triplet loss ...
research
08/10/2023

Informative Scene Graph Generation via Debiasing

Scene graph generation aims to detect visual relationship triplets, (sub...

Please sign up or login with your details

Forgot password? Click here to reset