Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam

01/17/2023
by   Kavya Manohar, et al.
0

In a hybrid automatic speech recognition (ASR) system, a pronunciation lexicon (PL) and a language model (LM) are essential to correctly retrieve spoken word sequences. Being a morphologically complex language, the vocabulary of Malayalam is so huge and it is impossible to build a PL and an LM that cover all diverse word forms. Usage of subword tokens to build PL and LM, and combining them to form words after decoding, enables the recovery of many out of vocabulary words. In this work we investigate the impact of using syllables as subword tokens instead of words in Malayalam ASR, and evaluate the relative improvement in lexicon size, model memory requirement and word error rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2017

Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies

Today, the vocabulary size for language models in large vocabulary speec...
research
12/30/2022

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) ben...
research
07/16/2021

A Comparison of Methods for OOV-word Recognition on a New Public Dataset

A common problem for automatic speech recognition systems is how to reco...
research
06/27/2018

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

In automatic speech recognition (ASR) systems, recurrent neural network ...
research
08/10/2019

Unsupervised Stemming based Language Model for Telugu Broadcast News Transcription

In Indian Languages , native speakers are able to understand new words f...
research
03/29/2022

Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition

Conformer has shown a great success in automatic speech recognition (ASR...
research
05/18/2020

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

In this paper, we present a series of complementary approaches to improv...

Please sign up or login with your details

Forgot password? Click here to reset