Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

06/27/2022
by   Andrew Catellier, et al.
0

Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require “reference” (original or undistorted) versions of the waveforms they evaluate. Our initial WAWEnet publication introduced four WAWEnets and each emulated the output of an established full-reference speech quality or intelligibility estimation algorithm. We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values. We create a second network that additionally tracks four subjective speech quality dimensions. We offer a third network that focuses on just subjective quality scores and achieves very high levels of agreement. This work has leveraged 334 hours of speech in 13 languages, over two million full-reference target values and over 93,000 subjective mean opinion scores. We also interpret the operation of WAWEnets and identify the key to their operation using the language of signal processing: ReLUs strategically move spectral information from non-DC components into the DC component. The DC values of 96 output signals define a vector in a 96-D latent space and this vector is then mapped to a quality or intelligibility value for the input waveform.

READ FULL TEXT

page 1

page 11

research
09/19/2019

WEnets: A Convolutional Framework for Evaluating Audio Waveforms

We describe a new convolutional framework for waveform evaluation, WEnet...
research
06/16/2019

Parametric Resynthesis with neural vocoders

Noise suppression systems generally produce output speech with copromise...
research
09/12/2023

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

In this work, we present CleanUNet 2, a speech denoising model that comb...
research
11/10/2020

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model

High-quality speech corpora are essential foundations for most speech ap...
research
04/22/2021

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, ...
research
05/03/2021

Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

In this paper, we present a full-reference speech quality prediction mod...
research
12/03/2020

Individually amplified text-to-speech

Text-to-speech (TTS) offers the opportunity to compensate for a hearing ...

Please sign up or login with your details

Forgot password? Click here to reset