All-in-One Image-Grounded Conversational Agents

12/28/2019
by   Da Ju, et al.
0

As single-task accuracy on individual language and image tasks has improved substantially in the last few years, the long-term goal of a generally skilled agent that can both see and talk becomes more feasible to explore. In this work, we focus on leveraging existing individual language and image tasks, along with resources that incorporate both vision and language towards that objective. We explore architectures that combine state-of-the-art Transformer and ResNeXt modules fed into a multimodal module to produce a combined model trained on many tasks. We provide a thorough analysis of the components of the model, and transfer performance when training on one, some, or all of the tasks. Our final models provide a single system that obtains good results on all vision and language tasks considered, and improves the state of the art in image-grounded conversational applications.

READ FULL TEXT
research
04/16/2021

VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks

Neural module networks (NMN) have achieved success in image-grounded tas...
research
06/18/2021

GEM: A General Evaluation Benchmark for Multimodal Tasks

In this paper, we present GEM as a General Evaluation benchmark for Mult...
research
11/09/2019

The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

We introduce dodecaDialogue: a set of 12 tasks that measures if a conver...
research
07/08/2020

Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents

Building multimodal dialogue understanding capabilities situated in the ...
research
08/09/2019

VisualBERT: A Simple and Performant Baseline for Vision and Language

We propose VisualBERT, a simple and flexible framework for modeling a br...
research
01/23/2019

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

We introduce a new approach to generative data-driven dialogue systems (...
research
04/17/2022

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

Knowledge-grounded conversational models are known to suffer from produc...

Please sign up or login with your details

Forgot password? Click here to reset