Normalization of Non-Standard Words in Croatian Texts

03/27/2015
by   Slobodan Beliga, et al.
0

This paper presents text normalization which is an integral part of any text-to-speech synthesis system. Text normalization is a set of methods with a task to write non-standard words, like numbers, dates, times, abbreviations, acronyms and the most common symbols, in their full expanded form are presented. The whole taxonomy for classification of non-standard words in Croatian language together with rule-based normalization methods combined with a lookup dictionary are proposed. Achieved token rate for normalization of Croatian texts is 95 form.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Normalizing Text using Language Modelling based on Phonetics and String Similarity

Social media networks and chatting platforms often use an informal versi...
research
09/07/2022

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Converting written texts into their spoken forms is an essential problem...
research
11/01/2021

PerSpeechNorm: A Persian Toolkit for Speech Processing Normalization

In general, speech processing models consist of a language model along w...
research
01/21/2020

A Hierarchical Location Normalization System for Text

It's natural these days for people to know the local events from massive...
research
01/30/2023

Using n-aksaras to model Sanskrit and Sanskrit-adjacent texts

Despite – or perhaps because of – their simplicity, n-grams, or contiguo...
research
05/09/2020

Texts Ciphering by using Translation Principle

The proposed algorithm that is presented in this paper is based on using...
research
10/10/2017

MoNoise: Modeling Noise Using a Modular Normalization System

We propose MoNoise: a normalization model focused on generalizability an...

Please sign up or login with your details

Forgot password? Click here to reset