indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages

03/31/2022
by   Anirudh Gupta, et al.
0

Automatic Speech Recognition (ASR) generates text which is most of the times devoid of any punctuation. Absence of punctuation is text can affect readability. Also, down stream NLP tasks such as sentiment analysis, machine translation, greatly benefit by having punctuation and sentence boundary information. We present an approach for automatic punctuation of text using a pretrained IndicBERT model. Inverse text normalization is done by hand writing weighted finite state transducer (WFST) grammars. We have developed this tool for 11 Indic languages namely Hindi, Tamil, Telugu, Kannada, Gujarati, Marathi, Odia, Bengali, Assamese, Malayalam and Punjabi. All code and data is publicly. available

READ FULL TEXT

page 1

page 2

page 3

research
04/11/2021

NeMo Inverse Text Normalization: From Development To Production

Inverse text normalization (ITN) converts spoken-domain automatic speech...
research
11/18/2015

Enhancements in statistical spoken language translation by de-normalization of ASR results

Spoken language translation (SLT) has become very important in an increa...
research
04/03/2018

Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text

Yorùbá is a widely spoken West African language with a writing system ri...
research
11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...
research
03/13/2021

OkwuGbé: End-to-End Speech Recognition for Fon and Igbo

Language is inherent and compulsory for human communication. Whether exp...
research
08/23/2021

A Unified Transformer-based Framework for Duplex Text Normalization

Text normalization (TN) and inverse text normalization (ITN) are essenti...
research
02/12/2021

Neural Inverse Text Normalization

While there have been several contributions exploring state of the art t...

Please sign up or login with your details

Forgot password? Click here to reset