Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

07/31/2023
by   Manuel Sam Ribeiro, et al.
0

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings. Our approach bootstraps a G2P with a small set of annotated examples. The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation. Given hypothesized phoneme labels, we learn pronunciation dictionaries for out-of-vocabulary words, and we use those to re-train the G2P system. Results indicate that our approach consistently improves the phone error rate of G2P systems across languages and amount of available data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

The task of grapheme-to-phoneme (G2P) conversion is important for both s...
research
05/26/2021

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition

Loanwords, such as Anglicisms, are a challenge in German speech recognit...
research
03/01/2021

Comparing acoustic analyses of speech data collected remotely

Face-to-face speech data collection has been next to impossible globally...
research
06/01/2023

The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

We compare phone labels and articulatory features as input for cross-lin...
research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
12/01/2020

NHSS: A Speech and Singing Parallel Database

We present a database of parallel recordings of speech and singing, coll...
research
03/23/2022

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Hate speech classifiers exhibit substantial performance degradation when...

Please sign up or login with your details

Forgot password? Click here to reset