DeepAI AI Chat
Log In Sign Up

Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners

by   Jocelyn Huang, et al.

Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-speech (TTS) pipeline. However, G2P conversion is difficult for languages that contain heteronyms – words that have one spelling but can be pronounced in multiple ways. G2P datasets with annotated heteronyms are limited in size and expensive to create, as human labeling remains the primary method for heteronym disambiguation. We propose a RAD-TTS Aligner-based pipeline to automatically disambiguate heteronyms in datasets that contain both audio with text transcripts. The best pronunciation can be chosen by generating all possible candidates for each heteronym and scoring them with an Aligner model. The resulting labels can be used to create training datasets for use in both multi-stage and end-to-end G2P systems.


Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

End-to-end speech synthesis is a promising approach that directly conver...

Handwriting Classification for the Analysis of Art-Historical Documents

Digitized archives contain and preserve the knowledge of generations of ...

Characterizing Uncertainty in the Visual Text Analysis Pipeline

Current visual text analysis approaches rely on sophisticated processing...

NeMo Toolbox for Speech Dataset Construction

In this paper, we introduce a new toolbox for constructing speech datase...

An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Text simplification is concerned with reducing the language complexity a...

"So You Think You're Funny?": Rating the Humour Quotient in Standup Comedy

Computational Humour (CH) has attracted the interest of Natural Language...