Understanding Natural Language Instructions for Fetching Daily Objects Using GAN-Based Multimodal Target-Source Classification

06/17/2019
by   Aly Magassouba, et al.
0

In this paper, we address multimodal language understanding for unconstrained fetching instruction in domestic service robots context. A typical fetching instruction such as "Bring me the yellow toy from the white shelf" requires to infer the user intention, that is what object (target) to fetch and from where (source). To solve the task, we propose a Multimodal Target-source Classifier Model (MTCM), which predicts the region-wise likelihood of target and source candidates in the scene. Unlike other methods, MTCM can handle regionwise classification based on linguistic and visual features. We evaluated our approach that outperformed the state-of-the-art method on a standard data set. In addition, we extended MTCM with Generative Adversarial Nets (MTCM-GAN), and enabled simultaneous data augmentation and classification.

READ FULL TEXT

page 1

page 2

page 6

page 8

research
12/23/2019

A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects

In this study, we focus on multimodal language understanding for fetchin...
research
06/11/2018

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

This paper focuses on a multimodal language understanding method for car...
research
01/16/2018

Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification

The target task of this study is grounded language understanding for dom...
research
04/02/2022

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning

Conditional image generation is an active research topic including text2...
research
09/10/2019

Multimodal Attention Branch Network for Perspective-Free Sentence Generation

In this paper, we address the automatic sentence generation of fetching ...
research
07/12/2023

Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

Although domestic service robots are expected to assist individuals who ...
research
07/02/2021

Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots

Currently, domestic service robots have an insufficient ability to inter...

Please sign up or login with your details

Forgot password? Click here to reset