A Unified Speaker Adaptation Approach for ASR

10/16/2021
by   Yingzhu Zhao, et al.
8

Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR. Specifically, we gradually prune less contributing parameters on model encoder to a certain sparsity level, and use the pruned parameters for adaptation, while freezing the unpruned parameters to keep the original model performance. We conduct experiments on the Librispeech dataset. Our proposed approach brings relative 2.74-6.52 speaker adaptation. On target speaker adaptation, our method outperforms the baseline with up to 20.58 method by up to relative 2.54 data (e.g., 1 utterance), our method could improve the WER by relative 6.53 with only a few epochs of training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2020

Speaker-aware speech-transformer

Recently, end-to-end (E2E) models become a competitive alternative to th...
research
04/08/2021

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

Machine Speech Chain, which integrates both end-to-end (E2E) automatic s...
research
04/06/2021

Optimal Transport-based Adaptation in Dysarthric Speech Tasks

In many real-world applications, the mismatch between distributions of t...
research
08/26/2022

Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

Many automatic speech recognition (ASR) data sets include a single pre-d...
research
04/02/2022

Speaker adaptation for Wav2vec2 based dysarthric ASR

Dysarthric speech recognition has posed major challenges due to lack of ...
research
03/14/2022

Interpretable Dysarthric Speaker Adaptation based on Optimal-Transport

This work addresses the mismatch problem between the distribution of tra...
research
06/25/2019

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

In recent years, deep learning based machine lipreading has gained promi...

Please sign up or login with your details

Forgot password? Click here to reset