Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

04/15/2021
by   Shubhi Tyagi, et al.
0

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3 by the state of the art results on English. We treat TN as a sequence classification problem and propose a granular tokenization mechanism that enables the system to learn majority of the classes and their normalizations from the training data itself. This is further combined with minimal precoded linguistic knowledge for other classes. We publish the first results on TN for TTS in Spanish and Tamil and also demonstrate that the performance of the approach is comparable with the previous work done on English. All annotated datasets used for experimentation will be released at https://github.com/amazon-research/proteno.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2023

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

We present XPhoneBERT, the first multilingual model pre-trained to learn...
research
10/31/2016

RNN Approaches to Text Normalization: A Challenge

This paper presents a challenge to the community: given a large corpus o...
research
05/18/2023

a unified front-end framework for english text-to-speech synthesis

The front-end is a critical component of English text-to-speech (TTS) sy...
research
06/05/2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Polyphone disambiguation aims to capture accurate pronunciation knowledg...
research
08/23/2021

A Unified Transformer-based Framework for Duplex Text Normalization

Text normalization (TN) and inverse text normalization (ITN) are essenti...
research
11/28/2022

HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

Social media platforms allow users to freely share their opinions about ...
research
02/12/2021

Neural Inverse Text Normalization

While there have been several contributions exploring state of the art t...

Please sign up or login with your details

Forgot password? Click here to reset