Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model

07/04/2021
by   Zhiqi Huang, et al.
0

While Machine Comprehension (MC) has attracted extensive research interests in recent years, existing approaches mainly belong to the category of Machine Reading Comprehension task which mines textual inputs (paragraphs and questions) to predict the answers (choices or text spans). However, there are a lot of MC tasks that accept audio input in addition to the textual input, e.g. English listening comprehension test. In this paper, we target the problem of Audio-Oriented Multimodal Machine Comprehension, and its goal is to answer questions based on the given audio and textual information. To solve this problem, we propose a Dynamic Inter- and Intra-modality Attention (DIIA) model to effectively fuse the two modalities (audio and textual). DIIA can work as an independent component and thus be easily integrated into existing MC models. Moreover, we further develop a Multimodal Knowledge Distillation (MKD) module to enable our multimodal MC model to accurately predict the answers based only on either the text or the audio. As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models. Experimental results and analysis prove the effectiveness of the proposed approaches. First, the proposed DIIA boosts the baseline models by up to 21.08 MKD module allows our multimodal MC model to significantly outperform the unimodal models by up to 18.87 or textual data.

READ FULL TEXT
research
08/23/2018

Attention-Guided Answer Distillation for Machine Reading Comprehension

Despite that current reading comprehension systems have achieved signifi...
research
05/10/2018

Towards Inference-Oriented Reading Comprehension: ParallelQA

In this paper, we investigate the tendency of end-to-end neural Machine ...
research
01/12/2021

Latent Alignment of Procedural Concepts in Multimodal Recipes

We propose a novel alignment mechanism to deal with procedural reasoning...
research
06/27/2022

A Topic-Attentive Transformer-based Model For Multimodal Depression Detection

Depression is one of the most common mental disorders, which imposes hea...
research
06/29/2017

Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension

We develop a technique for transfer learning in machine comprehension (M...
research
04/20/2021

Towards Solving Multimodal Comprehension

This paper targets the problem of procedural multimodal machine comprehe...
research
09/04/2018

RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

Understanding and reasoning about cooking recipes is a fruitful research...

Please sign up or login with your details

Forgot password? Click here to reset