DeepAI AI Chat
Log In Sign Up

Attributes as Semantic Units between Natural Language and Visual Recognition

by   Marcus Rohrbach, et al.

Impressive progress has been made in the fields of computer vision and natural language processing. However, it remains a challenge to find the best point of interaction for these very different modalities. In this chapter we discuss how attributes allow us to exchange information between the two modalities and in this way lead to an interaction on a semantic level. Specifically we discuss how attributes allow using knowledge mined from language resources for recognizing novel visual categories, how we can generate sentence description about images and video, how we can ground natural language in visual content, and finally, how we can answer natural language questions about images.


page 2

page 14

page 16

page 17

page 18

page 19


Mining for meaning: from vision to language through multiple networks consensus

Describing visual data into natural language is a very challenging task,...

Person Re-Identification with Vision and Language

In this paper we propose a new approach to person re-identification usin...

Thoth: Improved Rapid Serial Visual Presentation using Natural Language Processing

Thoth is a tool designed to combine many different types of speed readin...

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural la...

Visual Intelligence through Human Interaction

Over the last decade, Computer Vision, the branch of Artificial Intellig...

The Role of the Input in Natural Language Video Description

Natural Language Video Description (NLVD) has recently received strong i...

Introduction to the iDian

The iDian (previously named as the Operation Agent System) is a framewor...