DeepAI AI Chat
Log In Sign Up

PerSpeechNorm: A Persian Toolkit for Speech Processing Normalization

by   Romina Oji, et al.
University of Tehran

In general, speech processing models consist of a language model along with an acoustic model. Regardless of the language model's complexity and variants, three critical pre-processing steps are needed in language models: cleaning, normalization, and tokenization. Among mentioned steps, the normalization step is so essential to format unification in pure textual applications. However, for embedded language models in speech processing modules, normalization is not limited to format unification. Moreover, it has to convert each readable symbol, number, etc., to how they are pronounced. To the best of our knowledge, there is no Persian normalization toolkits for embedded language models in speech processing modules, So in this paper, we propose an open-source normalization toolkit for text processing in speech applications. Briefly, we consider different readable Persian text like symbols (common currencies, #, @, URL, etc.), numbers (date, time, phone number, national code, etc.), and so on. Comparison with other available Persian textual normalization tools indicates the superiority of the proposed method in speech processing. Also, comparing the model's performance for one of the proposed functions (sentence separation) with other common natural language libraries such as HAZM and Parsivar indicates the proper performance of the proposed method. Besides, its evaluation of some Persian Wikipedia data confirms the proper performance of the proposed method.


page 1

page 2

page 3

page 4


Normalization of Non-Standard Words in Croatian Texts

This paper presents text normalization which is an integral part of any ...

Textually Pretrained Speech Language Models

Speech language models (SpeechLMs) process and generate acoustic data on...

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

How to boost speech pre-training with textual data is an unsolved proble...

NeMo: a toolkit for building AI applications using Neural Modules

NeMo (Neural Modules) is a Python framework-agnostic toolkit for creatin...

Visual Speech Language Models

Language models (LM) are very powerful in lipreading systems. Language m...

Modulating Language Models with Emotions

Generating context-aware language that embodies diverse emotions is an i...

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French

Language models for historical states of language are becoming increasin...