Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

06/15/2016
by   Naoya Takahashi, et al.
0

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

We present results that show it is possible to build a competitive, grea...
research
07/27/2022

Knowledge-driven Subword Grammar Modeling for Automatic Speech Recognition in Tamil and Kannada

In this paper, we present specially designed automatic speech recognitio...
research
06/16/2018

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

In this paper, we present our overall efforts to improve the performance...
research
04/19/2021

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

Subword units are commonly used for end-to-end automatic speech recognit...
research
05/26/2017

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

For conversational large-vocabulary continuous speech recognition (LVCSR...
research
05/08/2018

Comparing phonemes and visemes with DNN-based lipreading

There is debate if phoneme or viseme units are the most effective for a ...
research
03/02/2020

Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework

This article investigates into recently emerging approaches that use dee...

Please sign up or login with your details

Forgot password? Click here to reset