DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

03/09/2022
by   Guan-Ting Lin, et al.
0

Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages, but more importantly, very often the answers to the questions include name entities or out-of-vocabulary words that cannot be recognized correctly. Also, ASR aims to minimize recognition errors equally over all words, including many function words irrelevant to the SQA task. Therefore, SQA without ASR transcripts (textless) is always highly desired, although known to be very difficult. This work proposes Discrete Spoken Unit Adaptive Learning (DUAL), leveraging unlabeled data for pre-training and fine-tuned by the SQA downstream task. The time intervals of spoken answers can be directly predicted from spoken documents. We also release a new SQA benchmark corpus, NMSQA, for data with more realistic scenarios. We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data. Our code and model will be open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2019

Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

Spoken question answering (SQA) is challenging due to complex reasoning ...
research
05/01/2017

Speech-Based Visual Question Answering

This paper introduces speech-based visual question answering (VQA), the ...
research
08/07/2018

ODSQA: Open-domain Spoken Question Answering Dataset

Reading comprehension by machine has been widely studied, but machine co...
research
02/21/2023

Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

In this paper, we explore the application of language and speech technol...
research
06/09/2020

ConfNet2Seq: Full Length Answer Generation from Spoken Questions

Conversational and task-oriented dialogue systems aim to interact with t...
research
08/20/2023

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

While Large Language Models (LLMs) have demonstrated commendable perform...
research
12/26/2016

Abstractive Headline Generation for Spoken Content by Attentive Recurrent Neural Networks with ASR Error Modeling

Headline generation for spoken content is important since spoken content...

Please sign up or login with your details

Forgot password? Click here to reset