Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

07/05/2022
by   Ali Siahkoohi, et al.
0

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. As such, we use a pretrained Transformer in tandem with a convolutional encoder, which is trained end-to-end with a quantizer and a generative adversarial net decoder. Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of 600 bps that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. Subjective human evaluations suggest that the quality of the resulting codec is comparable or better than that of conventional codecs operating at three to four times the rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2022

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...
research
07/07/2021

Efficient Transformer for Direct Speech Translation

The advent of Transformer-based models has surpassed the barriers of tex...
research
10/16/2022

RedApt: An Adaptor for wav2vec 2 Encoding Faster and Smaller Speech Translation without Quality Compromise

Pre-trained speech Transformers in speech translation (ST) have facilita...
research
12/01/2017

Wavenet based low rate speech coding

Traditional parametric coding of speech facilitates low rate but provide...
research
07/25/2023

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

Recently, speech codecs based on neural networks have proven to perform ...
research
05/03/2022

Synthesized Speech Detection Using Convolutional Transformer-Based Spectrogram Analysis

Synthesized speech is common today due to the prevalence of virtual assi...
research
05/18/2023

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

This paper presents FastFit, a novel neural vocoder architecture that re...

Please sign up or login with your details

Forgot password? Click here to reset