Vision-to-Language Tasks Based on Attributes and Attention Mechanism

05/29/2019
by   Xuelong Li, et al.
0

Vision-to-language tasks aim to integrate computer vision and natural language processing together, which has attracted the attention of many researchers. For typical approaches, they encode image into feature representations and decode it into natural language sentences. While they neglect high-level semantic concepts and subtle relationships between image regions and natural language elements. To make full use of these information, this paper attempt to exploit the text guided attention and semantic-guided attention (SA) to find the more correlated spatial information and reduce the semantic gap between vision and language. Our method includes two level attention networks. One is the text-guided attention network which is used to select the text-related regions. The other is SA network which is used to highlight the concept-related regions and the region-related concepts. At last, all these information are incorporated to generate captions or answers. Practically, image captioning and visual question answering experiments have been carried out, and the experimental results have shown the excellent performance of the proposed approach.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 7

page 8

page 11

page 12

research
06/16/2022

Image Captioning based on Feature Refinement and Reflective Decoding

Automatically generating a description of an image in natural language i...
research
04/15/2022

Guiding Attention using Partial-Order Relationships for Image Captioning

The use of attention models for automated image captioning has enabled m...
research
04/27/2020

A Novel Attention-based Aggregation Function to Combine Vision and Language

The joint understanding of vision and language has been recently gaining...
research
07/14/2023

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Image captioning is a significant field across computer vision and natur...
research
06/03/2015

What value do explicit high level concepts have in vision to language problems?

Much of the recent progress in Vision-to-Language (V2L) problems has bee...
research
03/12/2016

Image Captioning with Semantic Attention

Automatically generating a natural language description of an image has ...
research
12/21/2016

Top-down Visual Saliency Guided by Captions

Neural image/video captioning models can generate accurate descriptions,...

Please sign up or login with your details

Forgot password? Click here to reset