No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

12/05/2017
by   Tara N. Sainath, et al.
0

For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine phoneme-based end-to-end models, which are contrasted against grapheme-based ones on a large vocabulary English Voice-search task, where we find that graphemes do indeed outperform phonemes. We also compare grapheme and phoneme-based approaches on a multi-dialect English task, which once again confirm the superiority of graphemes, greatly simplifying the system for recognizing multiple dialects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition

As the character-based end-to-end automatic speech recognition (ASR) mod...
research
03/15/2018

Advancing Acoustic-to-Word CTC Model

The acoustic-to-word model based on the connectionist temporal classific...
research
12/31/2018

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

The acoustic-to-word model based on the Connectionist Temporal Classific...
research
11/28/2017

Acoustic-To-Word Model Without OOV

Recently, the acoustic-to-word model based on the Connectionist Temporal...
research
08/02/2022

Multi-Module G2P Converter for Persian Focusing on Relations between Words

In this paper, we investigate the application of end-to-end and multi-mo...
research
05/16/2022

CONSENT: Context Sensitive Transformer for Bold Words Classification

We present CONSENT, a simple yet effective CONtext SENsitive Transformer...
research
08/29/2018

Neural Metaphor Detection in Context

We present end-to-end neural models for detecting metaphorical word use ...

Please sign up or login with your details

Forgot password? Click here to reset