DeepAI AI Chat
Log In Sign Up

Attributes as Semantic Units between Natural Language and Visual Recognition

04/12/2016
by   Marcus Rohrbach, et al.
0

Impressive progress has been made in the fields of computer vision and natural language processing. However, it remains a challenge to find the best point of interaction for these very different modalities. In this chapter we discuss how attributes allow us to exchange information between the two modalities and in this way lead to an interaction on a semantic level. Specifically we discuss how attributes allow using knowledge mined from language resources for recognizing novel visual categories, how we can generate sentence description about images and video, how we can ground natural language in visual content, and finally, how we can answer natural language questions about images.

READ FULL TEXT

page 2

page 14

page 16

page 17

page 18

page 19

06/05/2018

Mining for meaning: from vision to language through multiple networks consensus

Describing visual data into natural language is a very challenging task,...
10/03/2017

Person Re-Identification with Vision and Language

In this paper we propose a new approach to person re-identification usin...
08/05/2019

Thoth: Improved Rapid Serial Visual Presentation using Natural Language Processing

Thoth is a tool designed to combine many different types of speed readin...
05/29/2019

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural la...
11/12/2021

Visual Intelligence through Human Interaction

Over the last decade, Computer Vision, the branch of Artificial Intellig...
02/09/2021

The Role of the Input in Natural Language Video Description

Natural Language Video Description (NLVD) has recently received strong i...
10/15/2010

Introduction to the iDian

The iDian (previously named as the Operation Agent System) is a framewor...