Weakly Supervised Visual Semantic Parsing

01/08/2020
by   Alireza Zareian, et al.
0

Scene Graph Generation (SGG) aims to extract entities, predicates and their intrinsic structure from images, leading to a deep understanding of visual content, with many potential applications such as visual reasoning and image retrieval. Nevertheless, computer vision is still far from a practical solution for this task. Existing SGG methods require millions of manually annotated bounding boxes for scene graph entities in a large set of images. Moreover, they are computationally inefficient, as they exhaustively process all pairs of object proposals to predict their relationships. In this paper, we address those two limitations by first proposing a generalized formulation of SGG, namely Visual Semantic Parsing, which disentangles entity and predicate prediction, and enables sub-quadratic performance. Then we propose the Visual Semantic Parsing Network, VSPNet, based on a novel three-stage message propagation network, as well as a role-driven attention mechanism to route messages efficiently without a quadratic cost. Finally, we propose the first graph-based weakly supervised learning framework based on a novel graph alignment algorithm, which enables training without bounding box annotations. Through extensive experiments on the Visual Genome dataset, we show VSPNet outperforms weakly supervised baselines significantly and approaches fully supervised performance, while being five times faster.

READ FULL TEXT
research
09/05/2023

Anatomy-Driven Pathology Detection on Chest X-rays

Pathology detection and delineation enables the automatic interpretation...
research
12/01/2019

Learning to Relate from Captions and Bounding Boxes

In this work, we propose a novel approach that predicts the relationship...
research
09/27/2017

Scene Parsing by Weakly Supervised Learning with Image Descriptions

This paper investigates a fundamental problem of scene understanding: ho...
research
03/20/2023

Location-Free Scene Graph Generation

Scene Graph Generation (SGG) is a challenging visual understanding task....
research
03/14/2018

Approximate Query Matching for Image Retrieval

Traditional image recognition involves identifying the key object in a p...
research
04/08/2016

Deep Structured Scene Parsing by Learning with Image Descriptions

This paper addresses a fundamental problem of scene understanding: How t...
research
07/13/2021

Enforcing Consistency in Weakly Supervised Semantic Parsing

The predominant challenge in weakly supervised semantic parsing is that ...

Please sign up or login with your details

Forgot password? Click here to reset