Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

03/03/2023
by   Yuma Koizumi, et al.
0

Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the Web to studio-quality. To make our SR model robust against various degradation, we use (i) a speech representation extracted from w2v-BERT for the input feature, and (ii) a text representation extracted from transcripts via PnG-BERT as a linguistic conditioning feature. Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web. Audio samples are available at our demo page: google.github.io/df-conformer/miipher/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

We present a self-supervised speech restoration method without paired sp...
research
12/21/2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Prior works on improving speech quality with visual input typically stud...
research
08/29/2023

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

The goal of this work is to reconstruct high quality speech from lip mot...
research
06/17/2019

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

Modern text-to-speech (TTS) systems are able to generate audio that soun...
research
08/28/2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Transfer tasks in text-to-speech (TTS) synthesis - where one or more asp...
research
01/24/2022

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

The punctuation restoration task aims to correctly punctuate the output ...
research
06/05/2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

Lip-to-speech involves generating a natural-sounding speech synchronized...

Please sign up or login with your details

Forgot password? Click here to reset