Image Captioning as Neural Machine Translation Task in SOCKEYE

10/09/2018
by   Loris Bazzani, et al.
0

Image captioning is an interdisciplinary research problem that stands between computer vision and natural language processing. The task is to generate a textual description of the content of an image. The typical model used for image captioning is an encoder-decoder deep network, where the encoder captures the essence of an image while the decoder is responsible for generating a sentence describing the image. Attention mechanisms can be used to automatically focus the decoder on parts of the image which are relevant to predict the next word. In this paper, we explore different decoders and attentional models popular in neural machine translation, namely attentional recurrent neural networks, self-attentional transformers, and fully-convolutional networks, which represent the current state of the art of neural machine translation. We made the image captioning module available in SOCKEYE at https://github.com/awslabs/sockeye/tree/master/sockeye.

READ FULL TEXT
research
11/03/2020

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a give...
research
10/15/2018

Bringing back simplicity and lightliness into neural image captioning

Neural Image Captioning (NIC) or neural caption generation has attracted...
research
10/27/2016

Can Active Memory Replace Attention?

Several mechanisms to focus attention of a neural network on selected pa...
research
10/30/2018

Gated Hierarchical Attention for Image Captioning

Attention modules connecting encoder and decoders have been widely appli...
research
01/17/2018

Image Captioning using Deep Neural Architectures

Automatically creating the description of an image using any natural lan...
research
03/17/2022

On Vision Features in Multimodal Machine Translation

Previous work on multimodal machine translation (MMT) has focused on the...
research
07/14/2021

Surgical Instruction Generation with Transformers

Automatic surgical instruction generation is a prerequisite towards intr...

Please sign up or login with your details

Forgot password? Click here to reset