Long-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss

by   Sherif Abdelkarim, et al.

Scaling up the vocabulary and complexity of current visual understanding systems is necessary in order to bridge the gap between human and machine visual intelligence. However, a crucial impediment to this end lies in the difficulty of generalizing to data distributions that come from real-world scenarios. Typically such distributions follow Zipf's law which states that only a small portion of the collected object classes will have abundant examples (head); while most classes will contain just a few (tail). In this paper, we propose to study a novel task concerning the generalization of visual relationships that are on the distribution's tail, i.e. we investigate how to help AI systems to better recognize rare relationships like <S:dog, P:riding, O:horse>, where the subject S, predicate P, and/or the object O come from the tail of the corresponding distributions. To achieve this goal, we first introduce two large-scale visual-relationship detection benchmarks built upon the widely used Visual Genome and GQA datasets. We also propose an intuitive evaluation protocol that gives credit to classifiers who prefer concepts that are semantically close to the ground truth class according to wordNet- or word2vec-induced metrics. Finally, we introduce a visiolinguistic version of a Hubless loss which we show experimentally that it consistently encourages classifiers to be more predictive of the tail classes while still being accurate on head classes. Our code and models are available on http://bit.ly/LTVRR.


page 15

page 21

page 22

page 25


Class-Difficulty Based Methods for Long-Tailed Visual Recognition

Long-tailed datasets are very frequently encountered in real-world use c...

Learning Classifier Synthesis for Generalized Few-Shot Learning

Visual recognition in real-world requires handling long-tailed and even ...

Distributional Robustness Loss for Long-tail Learning

Real-world data is often unbalanced and long-tailed, but deep models str...

Adaptive Class Suppression Loss for Long-Tail Object Detection

To address the problem of long-tail distribution for the large vocabular...

Action Classification via Concepts and Attributes

Classes in natural images tend to follow long tail distributions. This i...

Long-tail Recognition via Compositional Knowledge Transfer

In this work, we introduce a novel strategy for long-tail recognition th...

Large-Scale Visual Relationship Understanding

Large scale visual understanding is challenging, as it requires a model ...