Speaker Selective Beamformer with Keyword Mask Estimation

10/25/2018
by   Yusuke Kida, et al.
0

This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subsequent utterances from the target speaker. Experimental evaluations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mixture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the proposed method for both simulated and real recorded test sets.

READ FULL TEXT
research
06/04/2023

End-to-End Joint Target and Non-Target Speakers ASR

This paper proposes a novel automatic speech recognition (ASR) system th...
research
06/26/2019

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

In this paper, we propose a novel auxiliary loss function for target-spe...
research
06/19/2018

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

This paper presents, in the context of multi-channel ASR, a method to ad...
research
05/09/2022

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition

Improving the accuracy of single-channel automatic speech recognition (A...
research
07/04/2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

In this paper, we present a novel modeling method for single-channel mul...
research
06/11/2022

Signal-informed DNN-based DOA Estimation combining an External Microphone and GCC-PHAT Features

Aiming at estimating the direction of arrival (DOA) of a desired speaker...
research
08/07/2020

Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) c...

Please sign up or login with your details

Forgot password? Click here to reset