Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation

06/16/2023
by   Shuo Chen, et al.
8

This paper investigates the problem of scene graph generation in videos with the aim of capturing semantic relations between subjects and objects in the form of ⟨subject, predicate, object⟩ triplets. Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature, ranging from ubiquitous interactions such as spatial relationships (in front of) to rare interactions such as twisting. In widely-used benchmarks such as Action Genome and VidOR, the imbalance ratio between the most and least frequent predicates reaches 3,218 and 3,408, respectively, surpassing even benchmarks specifically designed for long-tailed recognition. Due to the long-tailed distributions and label co-occurrences, recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes, ignoring those in the long tail. In this paper, we analyze the limitations of current approaches for scene graph generation in videos and identify a one-to-one correspondence between predicate frequency and recall performance. To make the step towards unbiased scene graph generation in videos, we introduce a multi-label meta-learning framework to deal with the biased predicate distribution. Our meta-learning framework learns a meta-weight network for each training sample over all possible label losses. We evaluate our approach on the Action Genome and VidOR benchmarks by building upon two current state-of-the-art methods for each benchmark. The experiments demonstrate that the multi-label meta-weight network improves the performance for predicates in the long tail without compromising performance for head classes, resulting in better overall performance and favorable generalizability. Code: <https://github.com/shanshuo/ML-MWN>.

READ FULL TEXT
research
05/08/2023

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

Long-tailed multi-label visual recognition (LTML) task is a highly chall...
research
05/22/2021

PLM: Partial Label Masking for Imbalanced Multi-label Classification

Neural networks trained on real-world datasets with long-tailed label di...
research
09/28/2020

Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Scene graph parsing aims to detect objects in an image scene and recogni...
research
10/03/2022

Unbiased Scene Graph Generation using Predicate Similarities

Scene Graphs are widely applied in computer vision as a graphical repres...
research
03/23/2023

Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning

Current video-based scene graph generation (VidSGG) methods have been fo...
research
07/11/2023

Unbiased Scene Graph Generation via Two-stage Causal Modeling

Despite the impressive performance of recent unbiased Scene Graph Genera...
research
04/03/2023

Unbiased Scene Graph Generation in Videos

The task of dynamic scene graph generation (SGG) from videos is complica...

Please sign up or login with your details

Forgot password? Click here to reset