Speaker Adaptation for Attention-Based End-to-End Speech Recognition

11/09/2019
by   Zhong Meng, et al.
0

We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic speech recognition. The first method is Kullback-Leibler divergence (KLD) regularization, in which the output distribution of a speaker-dependent (SD) AED is forced to be close to that of the speaker-independent (SI) model by adding a KLD regularization to the adaptation criterion. To compensate for the asymmetric deficiency in KLD regularization, an adversarial speaker adaptation (ASA) method is proposed to regularize the deep-feature distribution of the SD AED through the adversarial learning of an auxiliary discriminator and the SD AED. The third approach is the multi-task learning, in which an SD AED is trained to jointly perform the primary task of predicting a large number of output units and an auxiliary task of predicting a small number of output units to alleviate the target sparsity issue. Evaluated on a Microsoft short message dictation task, all three methods are highly effective in adapting the AED model, achieving up to 12.2 word error rate improvement over an SI AED trained from 3400 hours data for supervised and unsupervised adaptation, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2019

Speaker Adaptation for End-to-End CTC Models

We propose two approaches for speaker adaptation in end-to-end (E2E) aut...
research
04/29/2019

Adversarial Speaker Adaptation

We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
research
02/14/2020

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

We propose an unsupervised speaker adaptation method inspired by the neu...
research
01/12/2016

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

This work presents a broad study on the adaptation of neural network aco...
research
12/09/2018

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

Transcribed datasets typically contain speaker identity for each instanc...
research
10/14/2021

Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization

End-to-end neural diarization (EEND) with self-attention directly predic...
research
07/09/2022

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder

Intermediate layer output (ILO) regularization by means of multitask tra...

Please sign up or login with your details

Forgot password? Click here to reset