Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning

by   Wenqing Wang, et al.

Current video-based scene graph generation (VidSGG) methods have been found to perform poorly on predicting predicates that are less represented due to the inherent biased distribution in the training data. In this paper, we take a closer look at the predicates and identify that most visual relations (e.g. sit_above) involve both actional pattern (sit) and spatial pattern (above), while the distribution bias is much less severe at the pattern level. Based on this insight, we propose a decoupled label learning (DLL) paradigm to address the intractable visual relation prediction from the pattern-level perspective. Specifically, DLL decouples the predicate labels and adopts separate classifiers to learn actional and spatial patterns respectively. The patterns are then combined and mapped back to the predicate. Moreover, we propose a knowledge-level label decoupling method to transfer non-target knowledge from head predicates to tail predicates within the same pattern to calibrate the distribution of tail classes. We validate the effectiveness of DLL on the commonly used VidSGG benchmark, i.e. VidVRD. Extensive experiments demonstrate that the DLL offers a remarkably simple but highly effective solution to the long-tailed problem, achieving the state-of-the-art VidSGG performance.


Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation

Video-based scene graph generation (VidSGG) is an approach that aims to ...

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

Despite the huge progress in scene graph generation in recent years, its...

Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation

This paper investigates the problem of scene graph generation in videos ...

Recovering the Unbiased Scene Graphs from the Biased Ones

Given input images, scene graph generation (SGG) aims to produce compreh...

Peer Learning for Unbiased Scene Graph Generation

In this paper, we propose a novel framework dubbed peer learning to deal...

Vision Relation Transformer for Unbiased Scene Graph Generation

Recent years have seen a growing interest in Scene Graph Generation (SGG...

CAME: Context-aware Mixture-of-Experts for Unbiased Scene Graph Generation

The scene graph generation has gained tremendous progress in recent year...

Please sign up or login with your details

Forgot password? Click here to reset