Generative Speech Coding with Predictive Variance Regularization

02/18/2021
by   W. Bastiaan Kleijn, et al.
0

The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model. We introduce predictive-variance regularization to reduce the sensitivity to outliers, resulting in a significant increase in performance. We show that noise reduction to remove unwanted signals can significantly increase performance. We provide extensive subjective performance evaluations that show that our system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity.

READ FULL TEXT
research
07/01/2019

Analysis by Adversarial Synthesis -- A Novel Approach for Speech Vocoding

Classical parametric speech coding techniques provide a compact represen...
research
08/09/2021

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

Recently, GAN vocoders have seen rapid progress in speech synthesis, sta...
research
02/04/2021

Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Traditional low bit-rate speech coding approach only handles narrowband ...
research
12/04/2022

Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

This work adapts two recent architectures of generative models and evalu...
research
04/11/2018

CoT: Cooperative Training for Generative Modeling

We propose Cooperative Training (CoT) for training generative models tha...
research
11/30/2017

Learning to Adapt by Minimizing Discrepancy

We explore whether useful temporal neural generative models can be learn...
research
02/27/2023

Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow

There are two types of methods for non-autoregressive text-to-speech mod...

Please sign up or login with your details

Forgot password? Click here to reset