Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation

10/01/2019
by   Ivan Donadello, et al.
0

Semantic Image Interpretation is the task of extracting a structured semantic description from images. This requires the detection of visual relationships: triples (subject,relation,object) describing a semantic relation between a subject and an object. A pure supervised approach to visual relationship detection requires a complete and balanced training set for all the possible combinations of (subject, relation, object). However, such training sets are not available and would require a prohibitive human effort. This implies the ability of predicting triples which do not appear in the training set. This problem is called zero-shot learning. State-of-the-art approaches to zero-shot learning exploit similarities among relationships in the training set or external linguistic knowledge. In this paper, we perform zero-shot learning by using Logic Tensor Networks, a novel Statistical Relational Learning framework that exploits both the similarities with other seen relationships and background knowledge, expressed with logical constraints between subjects, relations and objects. The experiments on the Visual Relationship Dataset show that the use of logical constraints outperforms the current methods. This implies that background knowledge can be used to alleviate the incompleteness of training sets.

READ FULL TEXT
research
11/20/2021

Representing Prior Knowledge Using Randomly, Weighted Feature Networks for Visual Relationship Detection

The single-hidden-layer Randomly Weighted Feature Network (RWFN) introdu...
research
07/06/2022

PROTOtypical Logic Tensor Networks (PROTO-LTN) for Zero Shot Learning

Semantic image interpretation can vastly benefit from approaches that co...
research
07/28/2017

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Understanding visual relationships involves identifying the subject, the...
research
12/15/2016

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

We consider the task of visual net surgery, in which a CNN can be reconf...
research
03/09/2023

Knowledge-augmented Few-shot Visual Relation Detection

Visual Relation Detection (VRD) aims to detect relationships between obj...
research
11/29/2017

Structured learning and detailed interpretation of minimal object images

We model the process of human full interpretation of object images, name...
research
10/30/2014

Towards Learning Object Affordance Priors from Technical Texts

Everyday activities performed by artificial assistants can potentially b...

Please sign up or login with your details

Forgot password? Click here to reset