Deep Neural Networks for Visual Reasoning

09/24/2022
by   Thao Minh Le, et al.
0

Visual perception and language understanding are - fundamental components of human intelligence, enabling them to understand and reason about objects and their interactions. It is crucial for machines to have this capacity to reason using these two modalities to invent new robot-human collaborative systems. Recent advances in deep learning have built separate sophisticated representations of both visual scenes and languages. However, understanding the associations between the two modalities in a shared context for multimodal reasoning remains a challenge. Focusing on language and vision modalities, this thesis advances the understanding of how to exploit and use pivotal aspects of vision-and-language tasks with neural networks to support reasoning. We derive these understandings from a series of works, making a two-fold contribution: (i) effective mechanisms for content selection and construction of temporal relations from dynamic visual scenes in response to a linguistic query and preparing adequate knowledge for the reasoning process (ii) new frameworks to perform reasoning with neural networks by exploiting visual-linguistic associations, deduced either directly from data or guided by external priors.

READ FULL TEXT
research
08/31/2023

Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception

Vision-language models (VLMs) have shown powerful capabilities in visual...
research
12/26/2019

Vision and Language: from Visual Perception to Content Creation

Vision and language are two fundamental capabilities of human intelligen...
research
03/21/2023

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

Most humans use visual imagination to understand and reason about langua...
research
03/07/2019

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Dramatic progress has been witnessed in basic vision tasks involving low...
research
04/27/2020

PuzzLing Machines: A Challenge on Learning From Small Data

Deep neural models have repeatedly proved excellent at memorizing surfac...
research
04/29/2021

Comparing Visual Reasoning in Humans and AI

Recent advances in natural language processing and computer vision have ...
research
10/12/2021

Can machines learn to see without visual databases?

This paper sustains the position that the time has come for thinking of ...

Please sign up or login with your details

Forgot password? Click here to reset