A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

06/22/2022
by   Or Tal, et al.
3

Speech enhancement has seen great improvement in recent years using end-to-end neural networks. However, most models are agnostic to the spoken phonetic content. Recently, several studies suggested phonetic-aware speech enhancement, mostly using perceptual supervision. Yet, injecting phonetic features during model optimization can take additional forms (e.g., model conditioning). In this paper, we conduct a systematic comparison between different methods of incorporating phonetic information in a speech enhancement model. By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models. Specifically, we evaluate three settings for injecting phonetic information, namely: i) feature conditioning; ii) perceptual supervision; and iii) regularization. Phonetic features are obtained using an intermediate layer of either a supervised pre-trained Automatic Speech Recognition (ASR) model or by using a pre-trained Self-Supervised Learning (SSL) model. We further observe the effect of choosing different embedding layers on performance, considering both manual and learned configurations. Results suggest that using a SSL model as phonetic features outperforms the ASR one in most cases. Interestingly, the conditioning setting performs best among the evaluated configurations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation

This work presents our end-to-end (E2E) automatic speech recognition (AS...
research
02/11/2021

An Investigation of End-to-End Models for Robust Speech Recognition

End-to-end models for robust automatic speech recognition (ASR) have not...
research
06/14/2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

Large, pre-trained representation models trained using self-supervised l...
research
11/18/2022

Exploring WavLM on Speech Enhancement

There is a surge in interest in self-supervised learning approaches for ...
research
12/16/2021

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Recently, pioneer work finds that speech pre-trained models can solve fu...
research
09/28/2022

Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization

With the development of deep learning, neural network-based speech enhan...
research
12/21/2021

Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent

Recently, self-supervised learning (SSL) techniques have been introduced...

Please sign up or login with your details

Forgot password? Click here to reset