DeepAI AI Chat
Log In Sign Up

Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

01/10/2017
by   Abhinav Thanda, et al.
0

Multi-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7% relative improvement in WER is reported at -3 SNR dB

READ FULL TEXT

page 1

page 2

page 3

page 4

11/09/2016

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

In this work, we propose a training algorithm for an audio-visual automa...
05/15/2019

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

Speech-driven visual speech synthesis involves mapping features extracte...
01/16/2017

Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading

The Aduio-visual Speech Recognition (AVSR) which employs both the video ...
11/08/2015

Towards Structured Deep Neural Network for Automatic Speech Recognition

In this paper we propose the Structured Deep Neural Network (structured ...
05/10/2022

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Under noisy conditions, automatic speech recognition (ASR) can greatly b...