DeepAI AI Chat
Log In Sign Up

Exploring Transfer Learning for Low Resource Emotional TTS

by   Noé Tits, et al.
University of Mons

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of another speaker. Then we investigate the possibility to adapt this model to have emotional TTS by fine-tuning the neutral TTS model with a small emotional dataset.


page 1

page 2

page 3

page 4


Pre-Finetuning for Few-Shot Emotional Speech Recognition

Speech models have long been known to overfit individual speakers for ma...

A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech – a Deep Learning approach

In this project, we aim to build a Text-to-Speech system able to produce...

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

Recent neural Text-to-Speech (TTS) models have been shown to perform ver...

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

In forensic voice comparison the speaker embedding has become widely pop...

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emot...

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods hav...

Analysis of impact of emotions on target speech extraction and speech separation

Recently, the performance of blind speech separation (BSS) and target sp...