Spatial Knowledge Distillation to aid Visual Reasoning

12/10/2018
by   Somak Aditya, et al.
0

For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system's capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.

READ FULL TEXT
research
06/07/2022

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

Vision-and-language tasks are gaining popularity in the research communi...
research
10/18/2019

Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System

Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) h...
research
09/12/2021

On the Efficiency of Subclass Knowledge Distillation in Classification Tasks

This work introduces a novel knowledge distillation framework for classi...
research
11/17/2016

Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic

In this work, we explore a genre of puzzles ("image riddles") which invo...
research
04/13/2021

Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

In this work, we address the issues of missing modalities that have aris...
research
01/03/2019

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Referring object detection and referring image segmentation are importan...
research
06/24/2019

Integrating Knowledge and Reasoning in Image Understanding

Deep learning based data-driven approaches have been successfully applie...

Please sign up or login with your details

Forgot password? Click here to reset