Visual Superordinate Abstraction for Robust Concept Learning

05/28/2022
by   Qi Zheng, et al.
0

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to attribute perturbations and out-of-distribution compositions during inference. We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts, e.g. {red, blue,...} ∈ `color' subspace yet cube ∈ `shape'. In this paper, we propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces (i.e. visual superordinates). With only natural visual question answering data, our model first acquires the semantic hierarchy from a linguistic view, and then explores mutually exclusive visual superordinates under the guidance of linguistic hierarchy. In addition, a quasi-center visual concept clustering and a superordinate shortcut learning schemes are proposed to enhance the discrimination and independence of concepts within each visual superordinate. Experiments demonstrate the superiority of the proposed framework under diverse settings, which increases the overall answering accuracy relatively by 7.5% on reasoning with perturbations and 15.6% on compositional generalization tests.

READ FULL TEXT

page 2

page 8

page 9

page 10

page 11

page 12

page 13

research
02/26/2022

Building a visual semantics aware object hierarchy

The semantic gap is defined as the difference between the linguistic rep...
research
12/12/2021

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Video question answering requires the models to understand and reason ab...
research
07/19/2021

Separating Skills and Concepts for Novel Visual Question Answering

Generalization to out-of-distribution data has been a problem for Visual...
research
02/04/2020

Visual Concept-Metaconcept Learning

Humans reason with concepts and metaconcepts: we recognize red and green...
research
11/05/2019

Deep Collaborative Discrete Hashing with Semantic-Invariant Structure

Existing deep hashing approaches fail to fully explore semantic correlat...
research
11/29/2021

LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering

Video Question Answering (VideoQA), aiming to correctly answer the given...
research
06/07/2016

Multilingual Visual Sentiment Concept Matching

The impact of culture in visual emotion perception has recently captured...

Please sign up or login with your details

Forgot password? Click here to reset