PronouncUR: An Urdu Pronunciation Lexicon Generator

01/01/2018
by   Haris Bin Zia, et al.
0

State-of-the-art speech recognition systems rely heavily on three basic components: an acoustic model, a pronunciation lexicon and a language model. To build these components, a researcher needs linguistic as well as technical expertise, which is a barrier in low-resource domains. Techniques to construct these three components without having expert domain knowledge are in great demand. Urdu, despite having millions of speakers all over the world, is a low-resource language in terms of standard publically available linguistic resources. In this paper, we present a grapheme-to-phoneme conversion tool for Urdu that generates a pronunciation lexicon in a form suitable for use with speech recognition systems from a list of Urdu words. The tool predicts the pronunciation of words using a LSTM-based model trained on a handcrafted expert lexicon of around 39,000 words and shows an accuracy of 64 evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Multi-lingual speech recognition aims to distinguish linguistic expressi...
research
09/19/2021

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Unifying acoustic and linguistic representation learning has become incr...
research
04/05/2022

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Unpaired data has shown to be beneficial for low-resource automatic spee...
research
04/05/2021

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

We present SpeechStew, a speech recognition model that is trained on a c...
research
05/22/2017

Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

With the evolution of neural network based methods, automatic speech rec...
research
11/11/2022

Analysis of Male and Female Speakers' Word Choices in Public Speeches

The extent to which men and women use language differently has been ques...
research
05/19/2022

A machine transliteration tool between Uzbek alphabets

Machine transliteration, as defined in this paper, is a process of autom...

Please sign up or login with your details

Forgot password? Click here to reset