Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

06/26/2019
by   Naoyuki Kanda, et al.
0

In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers speech given a short sample of the target speaker. The proposed auxiliary loss function attempts to additionally maximize interference speaker ASR accuracy during training. This will regularize the network to achieve a better representation for speaker separation, thus achieving better accuracy on the target-speaker ASR. We evaluated our proposed method using two-speaker-mixed speech in various signal-to-interference-ratio conditions. We first built a strong target-speaker ASR baseline based on the state-of-the-art lattice-free maximum mutual information. This baseline achieved a word error rate (WER) of 18.06 produced a completely corrupted result (WER of 84.71 further reduced the WER by 6.6 WER of 16.87 auxiliary output branch for the proposed loss can even be used for a secondary ASR for interference speakers' speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

This paper investigates the use of target-speaker automatic speech recog...
research
03/31/2022

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

Speaker extraction algorithm extracts the target speech from a mixture s...
research
01/14/2021

Speaker activity driven neural speech extraction

Target speech extraction, which extracts the speech of a target speaker ...
research
10/25/2018

Speaker Selective Beamformer with Keyword Mask Estimation

This paper addresses the problem of automatic speech recognition (ASR) o...
research
05/08/2019

On the representation of speech and music

In most automatic speech recognition (ASR) systems, the audio signal is ...
research
04/15/2021

A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It

End-to-end Automatic Speech Recognition (ASR) models are commonly traine...
research
10/10/2021

Personalizing ASR with limited data using targeted subset selection

We study the task of personalizing ASR models to a target non-native spe...

Please sign up or login with your details

Forgot password? Click here to reset