Improving Deliberation by Text-Only and Semi-Supervised Training

06/29/2022
by   Ke Hu, et al.
0

Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deliberation text encoder, and large-scale text-to-speech and audio-only utterances using joint acoustic and text decoder (JATD) and semi-supervised training, we achieved 4 compared to the baseline deliberation. Compared to a state-of-the-art language model (LM) rescoring method, the deliberation model reduces the Google Voice Search WER by 11 positive human side-by-side evaluation compared to the state-of-the-art LM rescorer with reasonable endpointer latencies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

Lattice-based lightly-supervised acoustic model training

In the broadcast domain there is an abundance of related text data and p...
research
10/29/2021

Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition

Recent advances in unsupervised representation learning have demonstrate...
research
11/05/2020

Semi-supervised Learning for Singing Synthesis Timbre

We propose a semi-supervised singing synthesizer, which is able to learn...
research
10/03/2019

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

We present a novel generative model that combines state-of-the-art neura...
research
08/30/2018

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

Although end-to-end text-to-speech (TTS) models such as Tacotron have sh...
research
04/19/2023

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Unpaired text and audio injection have emerged as dominant methods for i...
research
08/31/2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

The spontaneous behavior that often occurs in conversations makes speech...

Please sign up or login with your details

Forgot password? Click here to reset