Learning of Visual Relations: The Devil is in the Tails

08/22/2021
by   Alakh Desai, et al.
0

Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed problems due to their tendency to overfit. In this paper, we explore an alternative hypothesis, denoted the Devil is in the Tails. Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions. To test this hypothesis, we devise a new approach for training visual relationships models, which is inspired by state-of-the-art long-tailed recognition literature. This is based on an iterative decoupled training scheme, denoted Decoupled Training for Devil in the Tails (DT2). DT2 employs a novel sampling approach, Alternating Class-Balanced Sampling (ACBS), to capture the interplay between the long-tailed entity and predicate distributions of visual relations. Results show that, with an extremely simple architecture, DT2-ACBS significantly outperforms much more complex state-of-the-art methods on scene graph generation tasks. This suggests that the development of sophisticated models must be considered in tandem with the long-tailed nature of the problem.

READ FULL TEXT

page 8

page 12

research
05/01/2021

Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

The problem of long-tailed recognition, where the number of examples per...
research
05/06/2021

VideoLT: Large-scale Long-tailed Video Recognition

Label distributions in real-world are oftentimes long-tailed and imbalan...
research
07/21/2020

Balanced Meta-Softmax for Long-Tailed Visual Recognition

Deep classifiers have achieved great success in visual recognition. Howe...
research
12/01/2020

Disentangling Label Distribution for Long-tailed Visual Recognition

The current evaluation protocol of long-tailed visual recognition trains...
research
06/23/2023

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation

Scene Graph Generation (SGG) aims to structurally and comprehensively re...
research
04/12/2021

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Training on datasets with long-tailed distributions has been challenging...
research
08/24/2020

Balanced Activation for Long-tailed Visual Recognition

Deep classifiers have achieved great success in visual recognition. Howe...

Please sign up or login with your details

Forgot password? Click here to reset