PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

08/11/2020
by   Umut Isik, et al.
16

Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in training data. We introduce several innovations that lead to better large neural networks for speech enhancement. The novel PoCoNet architecture is a convolutional neural network that, with the use of frequency-positional embeddings, is able to more efficiently build frequency-dependent features in the early layers. A semi-supervised method helps increase the amount of conversational training data by pre-enhancing noisy datasets, improving performance on real recordings. A new loss function biased towards preserving speech quality helps the optimization better match human perceptual opinions on speech quality. Ablation experiments and objective and human opinion metrics show the benefits of the proposed improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2021

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

Supervised speech enhancement relies on parallel databases of degraded s...
research
08/24/2020

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Speech is converted to digital signals using speech coding for efficient...
research
01/11/2023

Perceive and predict: self-supervised speech representation based loss functions for speech enhancement

Recent work in the domain of speech enhancement has explored the use of ...
research
02/16/2023

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Despite rapid advancement in recent years, current speech enhancement mo...
research
11/08/2020

Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

One of the strengths of traditional convolutional neural networks (CNNs)...
research
02/16/2018

Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy

For a speech-enhancement algorithm, it is highly desirable to simultaneo...
research
11/06/2018

Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement

We apply a fast kernel method for mask-based single-channel speech enhan...

Please sign up or login with your details

Forgot password? Click here to reset