Speaker Adaptation for End-to-End CTC Models

01/04/2019
by   Ke Li, et al.
0

We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech recognition systems. One is Kullback-Leibler divergence (KLD) regularization and the other is multi-task learning (MTL). Both approaches aim to address the data sparsity especially output target sparsity issue of speaker adaptation in E2E systems. The KLD regularization adapts a model by forcing the output distribution from the adapted model to be close to the unadapted one. The MTL utilizes a jointly trained auxiliary task to improve the performance of the main task. We investigated our approaches on E2E connectionist temporal classification (CTC) models with three different types of output units. Experiments on the Microsoft short message dictation task demonstrated that MTL outperforms KLD regularization. In particular, the MTL adaptation obtained 8.8% and 4.0% relative word error rate reductions (WERRs) for supervised and unsupervised adaptations for the word CTC model, and 9.6 WERRs for the mix-unit CTC model, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

We propose three regularization-based speaker adaptation approaches to a...
research
06/23/2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Fundamental modelling differences between hybrid and end-to-end (E2E) au...
research
06/24/2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

A key challenge for automatic speech recognition (ASR) systems is to mod...
research
11/16/2022

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

We analyze the impact of speaker adaptation in end-to-end architectures ...
research
11/05/2020

Multi-Accent Adaptation based on Gate Mechanism

When only a limited amount of accented speech data is available, to prom...
research
09/17/2023

Enhancing Quantised End-to-End ASR Models via Personalisation

Recent end-to-end automatic speech recognition (ASR) models have become ...
research
10/14/2021

Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization

End-to-end neural diarization (EEND) with self-attention directly predic...

Please sign up or login with your details

Forgot password? Click here to reset