Improving And Analyzing Neural Speaker Embeddings for ASR

01/11/2023
by   Christoph Lüscher, et al.
0

Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a conformer based hybrid HMM ASR system. For ASR, our improved embedding extraction pipeline in combination with the Weighted-Simple-Add integration method results in x-vector and c-vector reaching on par performance with i-vectors. We further compare and analyze different speaker embeddings. We present our acoustic model improvements obtained by switching from newbob learning rate schedule to one cycle learning schedule resulting in a  3 relative WER reduction on Switchboard, additionally reducing the overall training time by 17 additional  3 hybrid ASR system with speaker embeddings achieves 9.0 Hub5'01 with training on SWB 300h.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Learning Speaker Embedding from Text-to-Speech

Zero-shot multi-speaker Text-to-Speech (TTS) generates target speaker vo...
research
09/30/2019

Embeddings for DNN speaker adaptive training

In this work, we investigate the use of embeddings for speaker-adaptive ...
research
02/12/2021

Content-Aware Speaker Embeddings for Speaker Diarisation

Recent speaker diarisation systems often convert variable length speech ...
research
02/14/2020

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

We propose an unsupervised speaker adaptation method inspired by the neu...
research
06/26/2022

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Speaker adaptation is important to build robust automatic speech recogni...
research
04/02/2020

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

We present a complete training pipeline to build a state-of-the-art hybr...
research
02/08/2019

Speaker diarisation using 2D self-attentive combination of embeddings

Speaker diarisation systems often cluster audio segments using speaker e...

Please sign up or login with your details

Forgot password? Click here to reset