Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

01/11/2022
by   Hiroshi Sato, et al.
0

The combination of a deep neural network (DNN) -based speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end is a widely used approach to implement overlapping speech recognition. However, the SE front-end generates processing artifacts that can degrade the ASR performance. We previously found that such performance degradation can occur even under fully overlapping conditions, depending on the signal-to-interference ratio (SIR) and signal-to-noise ratio (SNR). To mitigate the degradation, we introduced a rule-based method to switch the ASR input between the enhanced and observed signals, which showed promising results. However, the rule's optimality was unclear because it was heuristically designed and based only on SIR and SNR values. In this work, we propose a DNN-based switching method that directly estimates whether ASR will perform better on the enhanced or observed signals. We also introduce soft-switching that computes a weighted sum of the enhanced and observed signals for ASR input, with weights given by the switching model's output posteriors. The proposed learning-based switching showed performance comparable to that of rule-based oracle switching. The soft-switching further improved the ASR performance and achieved a relative character error rate reduction of up to 23

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Although recent advances in deep learning technology improved automatic ...
research
11/01/2021

SNRi Target Training for Joint Speech Enhancement and Recognition

This study aims to improve the performance of automatic speech recogniti...
research
01/18/2022

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

It is challenging to improve automatic speech recognition (ASR) performa...
research
11/10/2018

Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition

Conventional deep neural network (DNN)-based speech enhancement (SE) app...
research
10/12/2021

Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models

A deep neural network (DNN)-based speech enhancement (SE) aiming to maxi...
research
08/24/2023

Naaloss: Rethinking the objective of speech enhancement

Reducing noise interference is crucial for automatic speech recognition ...
research
05/12/2020

Automatic Estimation of Inteligibility Measure for Consonants in Speech

In this article, we provide a model to estimate a real-valued measure of...

Please sign up or login with your details

Forgot password? Click here to reset