Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

01/10/2017
by   Abhinav Thanda, et al.
0

Multi-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7% relative improvement in WER is reported at -3 SNR dB

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2016

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

In this work, we propose a training algorithm for an audio-visual automa...
research
03/13/2018

Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion

Today's Automatic Speech Recognition systems only rely on acoustic signa...
research
05/15/2019

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

Speech-driven visual speech synthesis involves mapping features extracte...
research
08/09/2019

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling

This research addresses the problem of acoustic modeling of low-resource...
research
01/16/2017

Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading

The Aduio-visual Speech Recognition (AVSR) which employs both the video ...
research
09/02/2021

Binaural Audio Generation via Multi-task Learning

We present a learning-based approach for generating binaural audio from ...
research
11/08/2015

Towards Structured Deep Neural Network for Automatic Speech Recognition

In this paper we propose the Structured Deep Neural Network (structured ...

Please sign up or login with your details

Forgot password? Click here to reset