Multimodal Attention Branch Network for Perspective-Free Sentence Generation

09/10/2019
by   Aly Magassouba, et al.
0

In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as "bring me the yellow toy from the upper part of the white shelf" includes referring expressions, i.e., "from the white upper part of the white shelf". To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natural sentences in an end-to-end manner. Multi-ABN uses multiple images of the same fixed scene to generate sentences that are not tied to a particular viewpoint. This approach combines a linguistic attention branch mechanism with several attention branch mechanisms. We evaluated our approach, which outperforms the state-of-the-art method on a standard metrics. Our method also allows us to visualize the alignment between the linguistic input and the visual features.

READ FULL TEXT

page 3

page 7

page 8

research
07/09/2020

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network

Domestic service robots (DSRs) are a promising solution to the shortage ...
research
02/15/2019

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

Detecting visual relationships, i.e. <Subject, Predicate, Object> triple...
research
12/23/2019

A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects

In this study, we focus on multimodal language understanding for fetchin...
research
06/17/2019

Understanding Natural Language Instructions for Fetching Daily Objects Using GAN-Based Multimodal Target-Source Classification

In this paper, we address multimodal language understanding for unconstr...
research
12/01/2016

Video Captioning with Multi-Faceted Attention

Recently, video captioning has been attracting an increasing amount of i...
research
03/26/2016

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Understanding language goes hand in hand with the ability to integrate c...
research
11/05/2019

Recurrent Instance Segmentation using Sequences of Referring Expressions

The goal of this work is to segment the objects in an image that are ref...

Please sign up or login with your details

Forgot password? Click here to reset