Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

05/03/2021
by   Gabriel Mittags, et al.
0

In this paper, we present a full-reference speech quality prediction model with a deep learning approach. The model determines a feature representation of the reference and the degraded signal through a siamese recurrent convolutional network that shares the weights for both signals as input. The resulting features are then used to align the signals with an attention mechanism and are finally combined to estimate the overall speech quality. The proposed network architecture represents a simple solution for the time-alignment problem that occurs for speech signals transmitted through Voice-Over-IP networks and shows how the clean reference signal can be incorporated into speech quality models that are based on end-to-end trained neural networks.

READ FULL TEXT
research
12/12/2021

Visualising and Explaining Deep Learning Models for Speech Quality Prediction

Estimating quality of transmitted speech is known to be a non-trivial ta...
research
07/29/2020

DNN No-Reference PSTN Speech Quality Prediction

Classic public switched telephone networks (PSTN) are often a black box ...
research
04/19/2021

NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets

In this paper, we present an update to the NISQA speech quality predicti...
research
05/03/2022

Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis

Synthesized speech is common today due to the prevalence of virtual assi...
research
05/04/2022

Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

Perceptual evaluation of speech quality (PESQ) requires a clean speech r...
research
10/12/2018

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

This paper introduces a deep neural network model for subband-based spee...
research
06/27/2022

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional...

Please sign up or login with your details

Forgot password? Click here to reset