Few-shot Visual Relationship Co-localization

08/26/2021
by   Revant Teotia, et al.
7

In this paper, given a small bag of images, each containing a common but latent predicate, we are interested in localizing visual subject-object pairs connected via the common predicate in each of the images. We refer to this novel problem as visual relationship co-localization or VRC as an abbreviation. VRC is a challenging task, even more so than the well-studied object co-localization task. This becomes further challenging when using just a few images, the model has to learn to co-localize visual subject-object pairs connected via unseen predicates. To solve VRC, we propose an optimization framework to select a common visual relationship in each image of the bag. The goal of the optimization framework is to find the optimal solution by learning visual relationship similarity across images in a few-shot setting. To obtain robust visual relationship representation, we utilize a simple yet effective technique that learns relationship embedding as a translation vector from visual subject to visual object in a shared space. Further, to learn visual relationship similarity, we utilize a proven meta-learning technique commonly used for few-shot classification tasks. Finally, to tackle the combinatorial complexity challenge arising from an exponential number of feasible solutions, we use a greedy approximation inference algorithm that selects approximately the best solution. We extensively evaluate our proposed framework on variations of bag sizes obtained from two challenging public datasets, namely VrR-VG and VG-150, and achieve impressive visual co-localization performance.

READ FULL TEXT

page 1

page 4

page 8

research
04/17/2020

CPARR: Category-based Proposal Analysis for Referring Relationships

The task of referring relationships is to localize subject and object en...
research
04/29/2019

Learning to Find Common Objects Across Image Collections

We address the problem of finding a set of images containing a common, b...
research
05/28/2019

Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

Relations amongst entities play a central role in image understanding. D...
research
08/05/2019

Learning to Generalize to Unseen Tasks with Bilevel Optimization

Recent metric-based meta-learning approaches, which learn a metric space...
research
02/27/2017

Visual Translation Embedding Network for Visual Relation Detection

Visual relations, such as "person ride bike" and "bike next to car", off...
research
07/24/2022

VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

We introduce a few-shot localization dataset originating from photograph...
research
12/12/2019

Learning Effective Visual Relationship Detector on 1 GPU

We present our winning solution to the Open Images 2019 Visual Relations...

Please sign up or login with your details

Forgot password? Click here to reset