Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots

07/02/2021
by   Shintaro Ishikawa, et al.
0

Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by various ambiguities and missing information. In existing methods, the referring expressions that specify the relationships between objects are insufficiently modeled. In this paper, we propose Target-dependent UNITER, which learns the relationship between the target object and other objects directly by focusing on the relevant regions within an image, rather than the whole image. Our method is an extension of the UNITER-based Transformer that can be pretrained on general-purpose datasets. We extend the UNITER approach by introducing a new architecture for handling the target candidates. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

research
12/23/2019

A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects

In this study, we focus on multimodal language understanding for fetchin...
research
07/14/2023

Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks

This paper describes a domestic service robot (DSR) that fetches everyda...
research
07/02/2021

Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions

There have been many studies in robotics to improve the communication sk...
research
07/12/2023

Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

Although domestic service robots are expected to assist individuals who ...
research
06/17/2019

Understanding Natural Language Instructions for Fetching Daily Objects Using GAN-Based Multimodal Target-Source Classification

In this paper, we address multimodal language understanding for unconstr...
research
01/16/2018

Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification

The target task of this study is grounded language understanding for dom...
research
06/11/2018

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

This paper focuses on a multimodal language understanding method for car...

Please sign up or login with your details

Forgot password? Click here to reset