Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

02/15/2023
by   Jiajun Deng, et al.
0

Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity issue, which is exacerbated by data selection, is addressed by modelling the SD parameter uncertainty using Bayesian learning. Experiments on the benchmark 300-hour Switchboard and the 233-hour AMI datasets suggest that the proposed confidence score-based adaptation schemes consistently outperformed the baseline speaker-independent (SI) Conformer model and conventional non-Bayesian, point estimate-based adaptation using no speaker data selection. Similar consistent performance improvements were retained after external Transformer and LSTM language model rescoring. In particular, on the 300-hour Switchboard corpus, statistically significant WER reductions of 1.0 (9.5 on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER reductions of 2.7 obtained on the AMI development and evaluation sets.

READ FULL TEXT
research
06/24/2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

A key challenge for automatic speech recognition (ASR) systems is to mod...
research
11/17/2022

Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition

Modeling the speaker variability is a key challenge for automatic speech...
research
06/26/2023

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Rich sources of variability in natural speech present significant challe...
research
07/08/2019

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the...
research
02/09/2021

Bayesian Transformer Language Models for Speech Recognition

State-of-the-art neural language models (LMs) represented by Transformer...
research
03/05/2020

Tatistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech

In this study, we present an innovative technique for speaker adaptation...
research
03/05/2020

Statistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech

In this study, we present an innovative technique for speaker adaptation...

Please sign up or login with your details

Forgot password? Click here to reset