Cumulative Adaptation for BLSTM Acoustic Models

06/14/2019
by   Markus Kitza, et al.
0

This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modelling. Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set. By enhancing the first-pass i-vector based adaptation with a second-pass adaptation using speaker and environment dependent transformations within the network, a further relative improvement of 5% in word error rate was achieved. We have reevaluated the features used to estimate i-vectors and their normalization to achieve the best performance in a modern large scale automatic speech recognition system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introd...
research
08/14/2020

Adaptation Algorithms for Speech Recognition: An Overview

We present a structured overview of adaptation algorithms for neural net...
research
12/21/2015

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

We describe the University of Sheffield system for participation in the ...
research
11/20/2020

Improving RNN-T ASR Accuracy Using Untranscribed Context Audio

We present a new training scheme for streaming automatic speech recognit...
research
08/07/2020

Investigation of Speaker-adaptation methods in Transformer based ASR

End-to-end models are fast replacing conventional hybrid models in autom...
research
07/01/2017

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF...
research
11/16/2022

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

We analyze the impact of speaker adaptation in end-to-end architectures ...

Please sign up or login with your details

Forgot password? Click here to reset