FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations

by   Lingjie Mei, et al.

We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. Our model, namely FALCON, represents individual visual concepts, such as colors and shapes, as axis-aligned boxes in a high-dimensional space (the "box embedding space"). Given an input image and its paired sentence, our model first resolves the referential expression in the sentence and associates the novel concept with particular objects in the scene. Next, our model interprets supplemental sentences to relate the novel concept with other known concepts, such as "X has property Y" or "X is a kind of Y". Finally, it infers an optimal box embedding for the novel concept that jointly 1) maximizes the likelihood of the observed instances in the image, and 2) satisfies the relationships between the novel concepts and the known ones. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.


page 2

page 3

page 8

page 20

page 25

page 26

page 27


The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that lear...

Visual Concept-Metaconcept Learning

Humans reason with concepts and metaconcepts: we recognize red and green...

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

In this paper, we address the task of learning novel visual concepts, an...

From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts

Current Visual Question Answering (VQA) systems can answer intelligent q...

3D Concept Grounding on Neural Fields

In this paper, we address the challenging problem of 3D concept groundin...

Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

Concept Bottleneck Models (CBM) are inherently interpretable models that...

Description Logic EL++ Embeddings with Intersectional Closure

Many ontologies, in particular in the biomedical domain, are based on th...

Please sign up or login with your details

Forgot password? Click here to reset