Speech-Based Visual Question Answering

05/01/2017
by   Ted Zhang, et al.
0

This paper introduces speech-based visual question answering (VQA), the task of generating an answer given an image and a spoken question. Two methods are studied: an end-to-end, deep neural network that directly uses audio waveforms as input versus a pipelined approach that performs ASR (Automatic Speech Recognition) on the question, followed by text-based visual question answering. Furthermore, we investigate the robustness of both methods by injecting various levels of noise into the spoken question and find both methods to be tolerate noise at similar levels.

READ FULL TEXT

page 3

page 6

research
10/18/2020

Towards Data Distillation for End-to-end Spoken Conversational Question Answering

In spoken question answering, QA systems are designed to answer question...
research
08/20/2023

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

While Large Language Models (LLMs) have demonstrated commendable perform...
research
05/25/2020

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

In a spoken multiple-choice question answering (SMCQA) task, given a pas...
research
03/09/2022

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Spoken Question Answering (SQA) is to find the answer from a spoken docu...
research
08/08/2019

Mitigating Noisy Inputs for Question Answering

Natural language processing systems are often downstream of unreliable i...
research
09/28/2018

Direct optimization of F-measure for retrieval-based personal question answering

Recent advances in spoken language technologies and the introduction of ...
research
06/11/2018

Prosody Modifications for Question-Answering in Voice-Only Settings

Many popular form factors of digital assistant---such as Amazon Echo, Ap...

Please sign up or login with your details

Forgot password? Click here to reset