G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

10/22/2019
by   Duc Le, et al.
0

Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English. However, graphemic ASR still has problems with rare long-tail words that do not follow the standard spelling conventions seen in training, such as entity names. In this work, we present a novel method to train a statistical grapheme-to-grapheme (G2G) model on text-to-speech data that can rewrite an arbitrary character sequence into more phonetically consistent forms. We show that using G2G to provide alternative pronunciations during decoding reduces Word Error Rate by 3 bridges the gap on rare name recognition with an equivalent phonetic setup. Unlike many previously proposed methods, our method does not require any change to the acoustic model training procedure. This work reaffirms the efficacy of grapheme-based modeling and shows that specialized linguistic knowledge, when available, can be leveraged to improve graphemic ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

There is an implicit assumption that traditional hybrid approaches for a...
research
06/12/2017

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

Speech recognition systems for irregularly-spelled languages like Englis...
research
11/05/2021

Conformer-based Hybrid ASR System for Switchboard Dataset

The recently proposed conformer architecture has been successfully used ...
research
12/30/2022

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) ben...
research
03/23/2023

A Deliberation-based Joint Acoustic and Text Decoder

We propose a new two-pass E2E speech recognition model that improves ASR...
research
05/29/2023

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

End-to-end automatic speech recognition (E2E-ASR) has the potential to i...
research
11/05/2018

When CTC Training Meets Acoustic Landmarks

Connectionist temporal classification (CTC) training criterion provides ...

Please sign up or login with your details

Forgot password? Click here to reset