Modelling low-resource accents without accent-specific TTS frontend

01/11/2023
by   Georgi Tinchev, et al.
2

This work focuses on modelling a speaker's accent that does not have a dedicated text-to-speech (TTS) frontend, including a grapheme-to-phoneme (G2P) module. Prior work on modelling accents assumes a phonetic transcription is available for the target accent, which might not be the case for low-resource, regional accents. In our work, we propose an approach whereby we first augment the target accent data to sound like the donor voice via voice conversion, then train a multi-speaker multi-accent TTS model on the combination of recordings and synthetic data, to generate the donor's voice speaking in the target accent. Throughout the procedure, we use a TTS frontend developed for the same language but a different accent. We show qualitative and quantitative analysis where the proposed strategy achieves state-of-the-art results compared to other generative models. Our work demonstrates that low resource accents can be modelled with relatively little data and without developing an accent-specific TTS frontend. Audio samples of our model converting to multiple accents are available on our web page.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2020

Efficient neural speech synthesis for low-resource languages throughmultilingual modeling

Recent advances in neural TTS have led to models that canprodu...
research
08/20/2020

Efficient neural speech synthesis for low-resource languages through multilingual modeling

Recent advances in neural TTS have led to models that can produce high-q...
research
02/16/2022

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

State-of-the-art text-to-speech (TTS) systems require several hours of r...
research
06/26/2022

Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

Accented speech recognition and accent classification are relatively und...
research
11/11/2020

Low-resource expressive text-to-speech using data augmentation

While recent neural text-to-speech (TTS) systems perform remarkably well...
research
07/29/2022

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

The availability of data in expressive styles across languages is limite...
research
03/29/2022

NeuraGen-A Low-Resource Neural Network based approach for Gender Classification

Human voice is the source of several important information. This is in t...

Please sign up or login with your details

Forgot password? Click here to reset